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Abstract — The biconjugate gradient (BCG) method is the “natural” gener- 
alization of the classical conjugate gradient algorithm for Ilcrmitian positive defi- 
nite matrices to general noii-IIcrmitian linear systems. Unfortunately, the original 
BCG algorithm is susceptible to possible breakdowns and numerical instabilities. 
Recently, Freund and Naclitigal have proposed a novel BCG-type approach, the 
quasi-iuiuimal residual method (QMR), which overcomes the problems of BCG. 
Here, we present an implementation of QMR based on an s— stcj> version of the 11011- 
syinmetric look-ahead Lanczos algorithm. The main feature of the s — step Lanczos 
algorithm is that, in general, all inner products, except for one, can be computed 
in parallel at the end of each block; this is unlike the standard Lanczos process 
where inner products are generated sequentially. The resulting implementation of 
QMR is particularly attractive on massively parallel SIMD architectures, such as 
the Connection Machine. 


•This work was supported in part by DARPA via Cooperative Agreement NCC 2-387 between 
NASA and the Universities Space Research Association (USRA). 
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Introduction 


We are concerned with the iterative solution of large sparse linear systems 

Ax = b, ( l) 

where A is a nonsingular, in general non-IIermitian N X N matrix. Some of the most 
efficient iterative schemes for (1) are Krylov subspace methods : for any initial guess 
Xq G C^, they generate approximations to A~*b of the form 

x n € X 0 + A'n(r 0 , A), n = 1,2,..., (2) 


where r 0 = b — Ax 0 and 


/v n (r 0> A) = span {r 0 , /lr 0 , . . A 71 Vo} (3) 

is the nth Krylov subspace generated by ro and A. For example, the generalized mini- 
mal residual algorithm (GMRES) of Saad and Schultz [8] and the biconjugate gradient 
algorithm (BCG) of Lanczos [6] both satisfy (2). Unfortunately, for methods like GM- 
RES, work and storage requirements per iteration grow linearly with n and, therefore, 
versions with restarts are used in practice, which often results in slow convergence. 
In contrast, for BCG, work and storage requirements per iteration are constant and 
low. However, BCG typically exhibits a rather irregular convergence behavior and the 
method can even break down. 


The QMR Approach 

In [3], Freund and Nachtigal have proposed a BCG-type approach, the quasi- 
minimal residual algorithm (QMR), which overcomes the problems of BCG. The 
method uses an implementation developed by Freund, Gutknccht, and Nachtigal [1, 2] 
of the nonsymmctric Lanczos algorithm [5] with look-ahead [7] to generate basis vectors 
for the Krylov subspaccs (3). More precisely, with 

VM = [v { v 2 •••»«] = [Vi V 2 ••• Vi], (<l) 

Vk — [u nfc i], k 1, — /(n), 


we have 

h’n(r 0 ,A) = {K (n) z | z <E C n } for » = 1,2,.... (5) 

The blocks V* in (4) just contain the vectors corresponding to the fclli look-ahead 
Lanczos step of length 

hk = ”/t+i - n k- 

In the sequel, wc refer to the first vectors v„ k in each block as regular vectors , while 
the remaining vectors arc called inner vectors. Furthermore, the relation 

/lyf") = y( n +Vj[( n ) (0) 


holds. Here 7/ (n) is an (n + 1) X n upj)cr Ilesscnberg matrix which is also block tridi- 
agonal with / diagonal blocks of size hk X hk, k = 1,2,...,/. In addition to the right 


n 
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Lanczos vectors t>i,u 2l ..., the look-ahead Lanczos algorithm generates left Lanczos 
vectors Wi, iu 2 , . . . such that 


A 7 ) ~ span {u^, iu 2 , . . .,tu n } for n = 1,2,... , 


and, as in (4), we set 


Wk = [w nk w nfc +i w njfc+1 _i], * = !»•--> f- 

These vectors are just constructed such that right and left Lanczos vectors correspond- 
ing to different look-ahead steps are biorthogonal, i.e., 


wjv h = 


0 if j / fc, 

£>* if J = *, 


ii ^ 


( 7 ) 


and, moreover, the matrices D * are all nonsingular. 

By means of (5) and (6), the nth iterate (2) of any Krylov subspace method and 
the corresponding residual vector can be written as follows: 

x n = x 0 + V^z n for some z n 6 C", (8) 

r n = b - Ax n = W n +«> (||r 0 || a ci -// (n) *„). (9) 


Here e\ denotes the first unit vector in Il n+1 . 

For the QMR method the parameter vector z n in (8) is chosen such that the Eu- 
clidean norm of the coefficient vector in the representation (9) is minimal, i.e., as 
solution of the least squares problem 


min 


n n (||r 0 || a Ci -7/ (n) *)|| 2 , 


( 10 ) 


where f} n = diag (||ui|| 2 , ||t» 2 || 2 , . . ||t> n +i|| 2 )- Here, Q n is chosen such that all basis vec- 
tors 3 = 1? • • -i »+ 1, in the representation (9) of r n have the same Euclidean 

length. Note that is an upper Ilessenberg matrix with full column rank n. 

Hence (10) always has a unique solution z n and the QMR iterate x n is well defined by 
(8) and (10). Finally, we remark that z n can be easily updated from step to step, and 
the resulting QMR algorithm can be implemented using only short recurrences (sec [3] 
for details). 


An s-Step Lanczos Algorithm with Look-Ahead 

To enforce the biorthogonality conditions (7), inner products of vectors of length 
N need to be computed. In the implementation of the look-ahead Lanczos algorithm 
described in [1, 2], this is done sequentially, i.e. inner products are calculated in each 
iteration step n. On a massively parallel machine, such as the Connection Machine, 
the sequential computation of these inner products represents a bottleneck. 

In this section, we sketch a version of the look-ahead Lanczos algorithm which 
overcomes this problem and is more suited for a parallel machine. In contrast to the 
sequential algorithm, where look-ahead steps of size hk > 1 are performed only if neces- 
sary to avoid breakdowns of the Lanczos process, the philosophy of the 5-step Lanczos 
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algorithm is to construct Lanczos blocks of given size hk = s, whenever possible. This is 
clone by first generating s — 1 intermediate inner vectors by means of simple three-term 
recurrences 


^n+l — Cn^n Vn^n— 1 > (H) 

w n +i = A T w n - ( n w n - (12) 

with suitably chosen coefficients ( n ,7/ n , and rj nk = 0. The biorthogonality conditions 
(7) are then enforced only at the end of each block. This has the advantage that all 
inner products arising in the biorthogonalization process for the inner vectors of a whole 
block can be computed in parallel. We remark that to enforce (7) for the inner vectors 
in block I, it is sufficient to biorthogonalize them only against the vectors from the 
previous blocks / = /(n), / + 1, . . . , / using 

t’n = v n -V J D] l Wjv n -...-V l - l Dr_\\Vl l v n (13) 

u, n = W n - WjDfvJw n - ... - Wt^D^V^Wn. (M) 

Moreover, in general, only one previous block occurs in (13) and (14), i.e., / = / — 1. 

In [4], Kim and Chronopoulos proposed an s~step Lanczos algorithm using a fixed 
block size s throughout the whole process. Our numerical tests show that such an 
approach is not viable. In order to obtain a robust implementation of the s— step 
Lanczos algorithm, it is crucial to keep the block size variable and combine the process 
with a suitable look-ahead strategy. 

In the following algorithm, wc outline the s~step look-ahead Lanczos method which 
we propose. In each block step, the algorithm tries to build a block of size s. If the 
construction of such a block would lead to a singular or a nearly singular matrix Di in 
(7) or to a new pair r nj+1 and tu n , + ] of regular vectors which have dominant components 
in the old Krylov subspaces A\, e (t?|,/1) or K ni {w \ , /t T ), we either build a smaller block 
or, by performing sequential steps, a bigger block. 

Algorithm. Sketch of 3— step Lanczos algorithm with look-ahead 

0) Set V\ - ro/||r 0 || 2 and choose uq £ with ||uq|| 2 = 1; 

Set n\ = 1, / = 1, Co = Wo = 0; 

For l = 1,2,...: 

1) Compute s - 1 intermediate inner vectors via (tl) and (12) for n — 7 + 

s - 2; 

Set \ l — [u n j ’ ’ • Unj-fa— l]/ 11 l — [ Wjh • * * l], 

2) Construct the symmetric matrix \\ T J \); 

3) (Biorthogonalization of inner vectors.) 

Determine f by rif = ma x{iij | iij < — s + 1}; 

For n = n/ + 1, ...,??/ + 3 - 1, compute v n and w n via (13) and (14); 

= ® 01 Il^'^ll2 = $topf 

Set V/ — [u ?lJ ' ' ' i]> If / — + l]j 

4) Construct the symmetric matrix D\ = 1 Vf V( ; 
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5) Decide whether to construct v n ,+ a and w ni+3 as regular vectors or to reduce the 
block size and go to 8) or 6), respectively; 

6) If it is possible to construct regular vectors and w ni + 9 for s < s: 

set n; +1 = n { + s, V; = [u n , • • • u n , + ,_i], Wi = [u> n , • • • u>„ 1+ ,-i], and go to 8); 
Otherwise , try to increase the block size s by sequential steps : 
set 5 — s; 

Loop : 

Set s = 5 + 1, n = n/ + s — 2, compute and u> n +i via (11) and (IS), and 

biorthogonalize immediately: 

determine f by nj = max{nj | nj < n( — 5 + 1} and compute u n +i an d w n +i using 
formulas (13) and ( 1 1) (with n replaced by n+ l); If\\v n +i\\ 2 = 0 or ||tu n +i|| 2 = 0, 
slop; 

Set V( = [V[ v n +\], \V( = [I V\ iy n+ j] and update the matrix Wj V\ ; 

This loop is terminated if we can construct regular vectors v ni + 3 and w nt +$ or if 
we have reached the maximum block size. In the first case, go to 8), in the second 
case, go to 7); 

7) Determine the smallest value which failed the checks and update the upper bound 
n(A) to this value. The block is now enforced to close. Let its size be s and set 
"J + 1 = n t + s, V, = [u n , • ■ • v„ (+ j_i] Wi = [?«„, ■ • • u>„ |+i _i]; 

8) (Construct regular vectors r n(+1 and w ni + x .) 

Set n — ni+i, v n = Av n -\, tb n — A T w n -\, and compute 

Vn = V n - Vi-xD^Wl.K - V,D^W?'v n , 

W n = w n -]Vi_ l DfJ l Vi’l l w n -W l Df T V l T iv n -, 

//pn|| 2 = o or ||w n || 2 = 0, slop; 

Otherwise, set v n = v n /||6 n || 2 and w n = u> n /||u? n || 2 ; 

9) Construct the Ith blocks of the block tridiagonal matrix 7/( n-1 ) and set l = / + 1. 

We note that the quantity u(/t) in step 7) is an estimate of the norm of the matrix 
A which is used for our checks to guarantee that the Lanczos vectors remain sufficiently 
linearly independent, A similar concept was first introduced for the sequential look- 
ahead Lanczos algorithm in [1], These checks, the criteria for the decision in step 5), and 
further details of the algorithm will be presented in a forthcoming paper. Here, we only 
remark that the important properties (5), (G), and (7), which were used in the derivation 
of the QMR method, remain valid also for the 5-slcp Lanczos algorithm with look- 
ahead. Also, we note that the above algorithm can be realized with the same number of 
inner products as in the classical nonsymmetric Lanczos method without look-ahead. 
In particular, the s X s matrix WfVi in step 2) can be constructed by computing only 
25—1 inner products, rather than s 2 as the straightforward approach would suggest. 
Moreover, in step 4), the matrix D( can be updated from \Vf using only already 
available inner products. Finally, numerical experiments with an implementation of 
the resulting QMIt algorithm on the CM-2 will be reported elsewhere. 
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Abstract. We consider Krylov subspace methods for the solution of large sparse linear 
systems .4x = b with complex non-Hermitian coefficient matrices. Such linear systems arise 
in important applications, such as inverse scattering, numerical solution of time-dependent 
Schrodinger equations, underwater acoustics, eddy current computations, numerical com- 
putations in quantum chromodynamics, and numerical conformal mapping. Typically the 
resulting coefficient matrices A exhibit special structures, such as complex symmetry, or 
they are shifted Hermit ian matrices. In this paper, we first describe a Krylov subspace 
approach with iterates defined by a quasi-minimal residual property, the QMR method, 
for solving general complex non-Hermitian linear systems. Then, we study special Krylov 
subspace methods designed for the two families of complex symmetric respectively shifted 
Hermitian linear systems. We also include some results concerning the obvious approach 
to general complex linear systems by solving equivalent real linear systems for the real and 
imaginary parts of x. Finally, numerical experiments for linear systems arising from the 
complex Helmholtz equation are reported. 
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1. Introduction 


In this chapter, we make some introductory remarks about Krylov subspace methods and 
list applications where complex linear systems arise. Furthermore, tve gi^c tui outline of 
the thesis and introduce some notation. 

1.1. Krylov subspace methods 

One of the most frequently encountered tasks in numerical computations is the solution of 
nonsingular systems of linear equations 

Ax = b. (!■!) 

Often, as for linear systems resulting from finite difference or finite element approximations 
to partial differential equations (PDE’s), the coefficient matrix A of (1.1) is very large, 
but sparse. A natural way to exploit the sparsity of A in the solution process of (1.1) 
is to use iterative techniques which involve the coefficient matrix A only in the form of 
matrix-vector products. Most iterative schemes of this type fall into the category of Krylov 
subspace methods: they produce approximations i„, n = 1,2, . . to A -1 6 of the form 

x n € Xq + K n {ro, A). (I--) 

Here x 0 is any initial guess for (1.1), r Q = b — Ax o the corresponding residual \ector, and 

AA(r 0 ,A) = span{r 0 , Ar 0 , A n_1 r 0 } 

is the nth Krylov subspace generated by r 0 and A. Two classical examples of Krylov 
subspace methods are the conjugate algorithm (CG hereafter) due to Hestenes and Stiefel 
[HS] and Chebyshev iteration [GV], which are both methods for the solution of linear 
systems (1.1) with Hermitian positive definite coefficient matrices A. Especially CG is one 
of the most powerful techniques for solving Hermitian positive definite linear systems. Its 
success has prompted extensive research into generalizations of the method to indefinite 
and non-Hermitian matrices and a number of CG-like Krylov subspace methods have been 
proposed (see, e.g., [Sto, SSI, Saa2] for surveys). Besides CG-like schemes, the second 
important subclass of Krylov subspace methods are semi-iterative algorithms modeled 
after Chebyshev iteration. Eiermann, Niethammer, and Varga [ENV] have established a 
theory for methods of this type for non-Hermitian linear systems. 

In this thesis, we are mainly concerned with CG-like Krylov subspace methods. 
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1.2. Ideal Krylov subspace methods for non-Hermitian matrices 

Classical CG has two outstanding features. First, its iterates (1.2) are characterized by a 
minimization property. Secondly, x n can be generated cheaply, by means of simple three- 
term recurrences. For general non-Hermitian matrices, the situation is less satisfactory. 
An ideal CG-like Krylov subspace method for solving non-Hermitian linear systems (1.1) 
would have features similar to the classical CG algorithm. It would produce iterates x n in 
(1.2) which: 

(i) are characterized by a minimization property over A n (ro..4), such as the minimal 
residual property 

||& — Ax n || = min ||i — Ax||, € xo + K n { r o > ^4); (1-3) 

*€*0+K'n( r 0 A) 

(ii) can be computed with little work and low storage requirements per iteration. 
Unfortunately, it turns out that, for general non-Hermitian matrices, one cannot fulfill (i) 
and (ii) simultaneously. This result is due to Faber and Manteuffel [FM1, FM2] v.ho ha\e 
shown that CG-type algorithms with (i) and (ii) exist essentially only for matrices of the 
special form 

A = e' e (T + ial) where T = T H is Hermitian, a, 0 6 R, (1-4) 

(see also Voevodin [Voe] and Joubert and Young [J\]). Instead, most CG-tjpe methods 
for non-Hermitian linear systems satisfy either (i) or (ii). 

In the first category, the most successful scheme is the generalized minimal residua, 
algorithm (GMRES hereafter) due to Saad and Schultz [SS2]. It produces the iterates 
defined by (1.3) and thus fulfills (i). However, it violates (ii). since work and storage per 
iteration grow linearly with the iteration number. Consequently, in practice, one cannot 
afford to run the full algorithm and it is necessary to use restarts. For difficult problems, 
this often results in very slow convergence. 

In the second category, the archetype is the biconjugate gradient algorithm (BCG 
hereafter) which goes back to Lanczos [Lan2] and, later on, was revived by Fletcher [Fie]. 
BCG is based on simple three-term recurrences, which keep work and storage requirements 
constant at each iteration. However, the BCG iterates are defined by a Galerkin condition 
rather than a minimization property (i), which means that the algorithm can exhibit 
and typically does — a rather irregular convergence behavior with wild oscillations in 
the residual norm. Furthermore, in the BCG algorithm, breakdowns more precisely, 
division by 0 — may occur. In finite precision arithmetic, such exact breakdowns are 
very unlikely; however, near-breakdowns may occur, leading to numerical instabilities in 
subsequent iterations. Recently, two modifications of BCG, namely CGS [Son] and Bi- 
CGSTAB [Van], have been proposed. However, while these methods seem to work well in 
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many cases, they do not address the problem of breakdowns, and thus they too, like BCG. 
are susceptible to instabilities. In exact arithmetic, both CGS and Bi-CGSTAB break 
down every time BCG does. 

1.3. Complex linear systems 

While most linear systems which arise in practice have real coefficient matrices A and real 
right-hand sides b, there are some important applications which lead to complex linear 
systems. PDE’s which model dissipative processes (see, e.g., [Pie, Chapter 10], [Mar]) 
usuallv involve complex coefficient functions and/or complex boundary conditions [BGuT, 
KG], and discretizing them yields linear systems with complex matrices .4. A typical 
example for this category is the complex Helmholtz equation 

— Au - <7\U + icr^u — f, (1-5) 

where o\, <Ji are real coefficient functions, which describes the propagation of damped 
time-harmonic waves as, e.g., electromagnetic waves in conducting media [EH, Chapter 8]. 
Equations of type (1.5) also arise in situations where damping is usually negligible, as in 
long-range wave propagation problems in underwater acoustics [BGoT, Gol, SLJ], where, 
by means of parabolic approximation techniques [Tap] and discretization in range direction, 
the computation of three-dimensional wave propagation is reduced to the solution of a 
two-dimensional complex Helmholtz equation at each range step. Further applications, 
which give rise to complex linear systems, include discretizations of the time-dependent 
Schrodinger equation 

i^ = -Au + k(u) (1-6) 

at 

using implicit difference schemes [DFP], electromagnetic inverse scattering problems [PM. 
SPM], eddy current computations [BHST], numerical computations in quantum chromo- 
dynamics [BBGRM], and numerical conformal mapping [Tru], 

In all these examples, the resulting coefficient matrices -4 are non-Hermitian. How- 
ever, they still exhibit special structures. Often, as for the linear systems resulting from 
(1.6), -4 is a shifted Hermitian matrix, i.e., a matrix of the form (1.4). In most other 
cases, which lead to complex systems, as for the linear systems resulting from the complex 
Helmholtz equation (1.5) with first-order boundary conditions, the coefficient matrix is 
complex symmetric: 

A = A t . (1-7) 

Note that the two families (1.4) and (1.7) overlap. The matrix (1.4) is complex symmetric 
if, and only if, T is real. 

Surprisingly, when the resulting linear systems (1.1) are solved in practice, usually 
no attempt is made to exploit the special structures (1.4) or (1.7). Indeed, there are two 
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popular approaches. The first one (see, e.g ., [BG]) is to apply preconditioned CG to the 
Hermitian positive definite normal equations 

A H Ax = A H b. (1-8) 

Of course, complex numbers can always be avoided by rewriting (1.1) as a real linear 
system for the real and imaginary parts of x. The second popular approach is to solve 
this real and, in general, nonsymmetric linear system by one of the CG-like methods, for 
example GMRES. It turns out that in both cases the resulting iterative schen.es tend to 
converge slowly. As a consequence, complex linear systems have the bad reputation of 

being difficult to solve by CG-tvpe methods. 

Finally, we mention two applications for which shifted linear systems 

Ax = b, A = M + <?I, ^ 

where M and b are real and fixed, a G C, 

need to be solved repeatedly for different shifts a. This situation arises when real parabolic 
equations are solved using high-order implicit methods (see, e.g., [GSl, GS2j). Further- 
more, linear systems (1.9) also come up in the context of frequency response computation 

in control theory [Lau]. 

1.4. Overview of the thesis 

The purpose of this thesis is twofold. First, we present a novel BCG-like approach for 
general nonsingular non-Hermitian linear systems (1.1), the quasi-minimal residual algo- 
rithm (QMR hereafter), which overcomes the problems of BCG. The QMR method was 
first proposed by Freund [Fre4j for the special case of complex symmetric linear systems 
and recently extended to general non-Hermitian matrices by Freund and Nachtigal [FN1. 
FN2]. The QMR approach uses a look-ahead variant of the nonsymmetric Lanczos process 
to generate basis vectors for the Krylov subspaces K n (r 0 ,A). The look-ahead Lanczos 
approach was first proposed by Taylor [Tav] and Parlett, Taylor, and Liu [PTL]. For the 
QMR method, we use the implementation of the look-ahead Lanczos process which was 
recently developed by Freund. Gutknecht, and Nachtigal [FGN, FN1]. Using the Lanc- 
zos basis, the actual QMR iterates are then defined by a relaxed version of (1.3), namely 
a quasi-minimal residual property. The QMR approach can be implemented using only 
short recurrences and hence it still satisfies the requirement (ii) for an ideal Krylov subspace 
method. The quasi-minimal residual property ensures that QMR, unlike BCG, converges 
smoothly; moreover, existing BCG iterates can also be easily and stably recovered from 
the QMR process. Finally, for the QMR method, it is possible to obtain error bounds 
which are essentially the same as the standard bounds for GMRES. To the best of our 
knowledge, this is the first convergence result for a BCG-like algorithm. 
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Second, we present CG-type methods which exploit the special structures (1.4) re- 
spectively (1.7). In particular, we show that, for complex symmetric matrices, work and 
storage for the QMR approach can be halved. For shifted Hermitian matrices (1.4). we 
propose and analyze three different CG-type methods based on the minimal residual prop- 
erty (1.3), a Galerkin condition, and an Euclidean error minimization property. For the 
practical use of CG-type methods it is crucial that they can be combined with efficient 
preconditioners. Unfortunately, the more classical techniques, such as incomplete factor- 
ization, lead to preconditioned matrices which in general are no longer in the class (1.4). 
We show that this problem can be resolved and the special structure of the matrices (1*4) 
preserved by using polynomial preconditioning, and results on the optimal choice of the 
preconditioner axe given. Note that polynomial preconditioning is an attractive approach 
for vector and parallel computers and, because of that, has become very popular in recent 
years (see [Saa2] for a survey). 

Finally, we also present some results which indicate that for Krylov subspace methods 
it is always preferable to solve the original complex linear system rather than equivalent 
real ones. 

The outline of this thesis is then as follows. In Section 2, we are concerned with 
the nonsymmetric Lanczos process. In particular, we sketch the implementation of the 
look-ahead Lanczos algorithm proposed in [FGN, FN1]. In Section 3, we present the QMR 
method for general nonsingular non-Hermitian matrices. In Section 4, we consider CG- 
type algorithms for complex symmetric matrices. In Section 5, we study CG-like methods 
for shifted Hermitian matrices. In Section 6, we are concerned with the issue complex 
versus equivalent real linear systems”. In Section 7, we present some numerical examples 
for complex symmetric and shifted Hermitian linear systems. Finally, in Section S, we 
make some concluding remarks. 
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1.5. Notation 

Throughout this thesis, all vectors and matrices are assumed to be complex in general. As 
usual, i = y/—\. For any matrix M = [m^t ], we use the following notation: 

M = [mj^] = the complex conjugate of M, 

M T — [ m* ; ] = the transpose of M, 

M h — M T = the Hermitian of M, 

Re M = (M + JI)/ 2 = the real part of M, 

Im M = (M - - the imaginary part of M, 

a mix (M) = the largest singular value of M, 

^min(A/) = the smallest singular value of Af, 

||M|| = cr m3tX (M ) = the 2— norm of M. 

For any vector c€ C m and any matrix B € C mXm , we use the following notations: 

||c|| = \fcMc — Euclidean norm of c, 

||c||b = y/ c H Be = B— norm of c. if B is Hermitian positive definite, 

A (B) = the set of eigenvalues of B. 

A rai X (B) = the largest eigenvalue of B , if B is Hermitian, 

= the smallest eigenvalue of B , if B is Hermitian, 

A'„(c, B) = span{c, Be , .... B n ~ l c) 

= the nth Krylov subspace of C m generated by c and B. 

Furthermore, we denote by 

ej n) =[0 •••0 10 ••• o] r e R n 

T 

j 

the jt h unit vector of length n and by I n the n x n identity matrix. If the dimension n is 
evident from the context, we will simply write t } and L We denote by 

Tl n = = (Jo + 0*1 A + * * * + <Jn A n | CTo, <7 n € C} 

and n^ r) = {$(A) = &o + U\ A + ■ - ■ + (7 n A | cro , o*i , . - - , ^ R} 

the set of complex and real polynomials of degree at most n, respectively. Frequently, we 
will make use of the relation 

AT n (c,B) = {$(5)c|$€n„_ 1 }, 71 = 1,2,... • ( 110 ) 
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Throughout this thesis, X denotes the dimension of the coefficient matrix A of (1.1) 
and A € C' V * W is in general non-Hermitian. Moreover, we use the following notation: 

j 0 = initial guess for (1.1), 
x n = nth iterate, 

r n = b - Ax n = nth residual vector, 
i> n = nth right Lanczos vector, 
w n = nth left Lanczos vector. 

If it is not evident from the context which iterative method we are considering, quantities 
of different algorithms will be distinguished by superscripts, e.g., x$ MR and x 

Finally, one more note. In our formulations of the nonsymmetric Lanczos algorithm 
and of BCG, we use A T rather than A H . This was a deliberate choice in order to avoid 
complex conjugation of the scalars in the recurrences; the algorithms can be formulated 
equally well in either terms. 
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2. An implementation of the look-ahead Lanczos process for non-Hermitian 
matrices 

In this chapter, we first recall the classical nonsymmetric Lanczos method and its close 
relationship with formally orthogonal polynomials (FOP s hereafter). Next, we describe 
the basic idea of look-ahead Lanczos procedures, and finally, we present an actual imple- 
mentation of a Lanczos algorithm with look-ahead. 

2.1. The classical nonsymmetric Lanczos algorithm 

In 1950, Lanczos [Lanl] proposed the following algorithm for successive reduction of a 
general matrix A G C NxS ' to tridiagonal form. 

Algorithm 2.1. (Classical Lanczos method.) 

0) Choose r 0 , so E C N with r 0 , so 7^ 0; 

Set id = r 0 , W\ = So, t’o = u?o = 0; 

For n = 1, 2, . . . : 

1) Compute rj = w„v n ; 

If rj = 0: set L = n — 1, and stop; 

2) Otherwise, choose /3 n ,J n G C with d n 7n = Vi 
Set u n = i'nhn and w n = w n /3 n ; 

3) Compute a n = w^Av n ; 

Set v n + 1 = At’n — Qn^n — 3n v n-li 
Set U’n-t -1 = A T W n — Q„U' n — ') n W n -\. 


We refer to [Wil, pp. 3S8-394] for a detailed discussion of the Lanczos algorithm; in 
particular, proofs of the properties collected in Proposition 2.2 below can be found there. 
In the sequel, the notations 


V„ = [ Vi v 2 


i'n ] , W n = \wi w 2 u- n 

ot\ 0 • • • 0 


an d H fi 


7 2 a 2 

0 

0 ••• 


0 

/?„ 

0 7„ ot n 


will be used. Moreover, let 

L r = dimFTAr(r 0 , .4) and L/ = dim/<Ar(so,A ) 


( 2 . 1 ) 


( 2 . 2 ) 


(2.3) 
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denote the grade of ro with respect to .4 and the grade of so with respect to A . iespecti\el\, 
(cf. [Wil, p. 37]) and set 

L, = m\n{L r , Li} . (2-4) 

We remark that I r > 1 (I; > 1) is just the smallest integer such that the subspace 
A'Uro.A) (Al,( so<-4 7 ')) is ,4-invariant (A^-invariant). 

Proposition 2.2. 

a) In exact arithmetic, Algorithm 2.1 stops after a Unite number of steps n L 4- 1 and 
0 < L < L*. 

b) For k, n = 1, 2, . . . , L: 

T 

UJ fc V n = 

c) For n = 1,2 

A'„(r 0 ,.4) = span{ui,v 2 ,...,Ur.}, 

K n (s 0 ,A T ) = span{iui,tU2, ••■,«>*»}, 

AV n = V n H n + [ 0 0 ••• 0 v n + 1], 

A T W n = W n Hl + [ 0 0 0 i5„+i]. 

Note that the termination index L of Algorithm 2.1 is the smallest integer such that 


f °, if k ± n, (2.5) 

\ 1, if Jfc = T». 


^L+l^L+1 - 0- 


(2.S) 


There are two essentially different cases for fulfilling the termination condition (2.S). The 
first case, referred to as regular termination , occurs when u L+1 = 0 or u' L+1 = 0. If 
v L+1 = 0, then L = L r and the right Lanczos vectors ui,...,ut r span the .4-invariant 
subspace K Lr {r Q ,A). Similarly, if w L+l = 0, then L = L, and the left Lanczos vectors 
wi , . . . , wl, span the A^-invariant subspace I\’l, (so, A t ). Unfortunately, it can also happen 
that the termination condition (2.8) is satisfied with t’ i+1 0 and w L+1 0. This second 

case is referred to as serious breakdown [Wil, p. 3S9]. Note that, in this case, L < L* and 
the Lanczos vectors span neither an A-invariant nor an A^-invariant subspace of C 

It is the possibility of serious breakdowns, or, in finite precision arithmetic, of near- 
breakdowns , i. e., 


wl’vr, ~ 0, but w n 76 0 and v n jz 0, 


(2.9) 


that has brought the classical nonsymmetric Lanczos algorithm into discredit. However, 
by means of a look-ahead procedure, it is possible to leap (except in the very special case of 
an incurable breakdown [Tay]) over those iterations in which the standard algorithm would 
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break down. In the next section, using the intimate connection between the Lanczos 
process and FOP’s, we describe the basic idea of the Lanczos method with look-ahead. 

2.2. Orthogonal polynomials 

One readily verifies that the Lanczos vectors generated by Algorithm 2.1 are of the form 


v„ = $ n _i(.4)r 0 and w n = — — $„_i(A r )s 0 , (2.10) i 

7172 ■•■In PlP2 ■■ ■ Pn j 

where $„-i € LI n _i is a uniquely defined monic polynomial. Then, introducing the formal i 

inner product j 

($, 'L) := (^(.4 r )so) T ($(A)r 0 ) = s 0 r ¥(A)$(.4)r 0 (2.11) 

and using (2.6), (1.10), and (2.10), we can rewrite the biorthogonality condition (2.5) in I 

terms of polynomials: [ 

($„-!.$) = 0 for all $en„_ 2 (2.12) 

and 

($„_!,$„_,) #0. (2.13) 


Note that, except for the case of Hermitian A = A H (cf. Chapter 5), the formal inner 
product (2.11) is indefinite. Therefore, in the general case, there exist polynomials <5^0 
with ‘‘length” (<$,$)= 0 or even (<$, $) < 0. 

A polynomial $n_i G fU-i, $„_i ^ 0, that fulfills (2.12) is called a FOP (with 
respect to the formal inner product (2.11)) of degree n — 1 (see, e.g., [Bre], [Dra], [Gut]). 
Note that the condition (2.12) is empty for n = 1, and hence any $o = <^o ^ 0 is a FOP 
of degree 0. From (2.12), 

n - 1 ( -^ ) — C r 0 + Cr l4-|---- + <7 T J — l A ” 

is a FOP of degree n — 1 if. and only if. its coefficients cfq , <7 n _i are a nontrivial solution 

of the linear system 


PO P\ P2 • “ Pn- 2 ' 


<?Q 


" /in-1 ' 

P\ ■ 


^1 


P n 

P2 


<72 

— <7 n — i 

Pn + 1 

■ •’ P2n-5 

-Pn-2 P2n-5 P2n-4 - 


- G n — 2 - 


p2n — 3 - 


Here 

Pj = •S(f.4- ? r 0 = (A J , 1), > = 0,1,..., 

are the moments associated with (2.11). A FOP $„_i is called regular if it is uniquely 
determined by (2.12) up to a scalar, and it is said to be singular otherwise. We remark 
that a FOP of degree 0 is always regular. With (2.14), one easily verifies the statements 
in the following 
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Proposition 2.3. 

a) A regular FOP $«-i has exactly degree n - 1. In particular, a regular FOP is unique 
if it is required to be monic. 

b) A regular FOP of degree n - 1 exists if, and only if. the coefficient matrix of (2.14) is 
nonsingular. 

c) Let be a regular FOP (with respect to the formal inner product (2.11)) of degree 

n — 1. Then, a regular FOP of degree n exists if, and only if, (2.13) is satisfied. 

We remark that, by part b) of Proposition 2.3, singular FOP’s occur if, and only if, the 
corresponding linear system (2.14) has a singular coefficient matrix, but is consistent. If 
(2.14) is inconsistent, then no FOP $ n -i exists. This case is referred to as deficient , and 
by relaxing (2.12) slightly, one can define so-called deficient FOP s (see [Gut] for details). 
Simple examples (see, e.i?., [FN1, Section 13]) show that the singular and deficient cases 
do indeed occur. 

Now let us return to the classical nonsymmetric Lanczos process 2.1. Using (2.8), 
(2.10), (2.11), and part c) of Proposition 2.3, we conclude that a serious breakdown occurs 
if, and only if, no regular FOP exists for some L < L+. In this case, the termination index 
L is the smallest integer L for which there exists no regular FOP of degree L. 

On the other hand, there is a maximal subset of indices 

C {1,2....,!*}, n, := 1 < n 2 < • • • < nj < L„ (2.15) 

such that, for each j = 1,2, .... J, there exists a monic regular FOP $n,-i £ n„._i. Note 
that ni = 1 since $o(^) = 1 is a monic regular FOP of degree 0. Furthermore, three 
successive regular FOP's 'I' n ( _j, and ^-i are connected via a relation ot the 

form 

W «*„-,(*)♦,.,-.(*) - (216| 

where 'P„ ; -l € n n ^ +j _ n/ , <5 n> -i £ C. 

The recurrences (2.16) for FOP’s were mentioned by Gragg [Gra, pp. 222-223] and by 
Draux [Dra]; also, in the context of the partial realization problem, by Kung [Kun, Chapter 
IV] and Gragg and Lindquist [GL]. For a proof of (2.16), we refer the reader to [Gut]. 
Now, setting, in analogy to (2.10), 

V nj = <t>nj$ n ,-i(A)r 0 and w nj = ^ n> -i(A )s 0 , 

where <f > nj , rp nj ^ 0 are scaling factors, we obtain two sequences of vectors {u n> }/= i an d 
{ w nj }/=i which, in view of (2.16), can be computed by means of short recurrences. These 
vectors will be called regular vectors, since they correspond to regular FOP s. Note that 
v\ and w\ are always regular. The look-ahead Lanczos procedure is an extension of the 
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classical nonsvmmetric Lanczos algorithm; in exact arithmetic, it generates the vectors v„ } 
and w n , j = 1, . • • , J. If nj = L, in (2.15), then these vectors can be complemented to a 
basis for an ,4-invariant or .4 r - invariant subspace of C' V . An incurable breakdown occurs 
if, and only if, nj < I. in (2.15). Finally, note that the regular vectors t’„ ; and u- n , are 
uniquely defined (up to a nonzero scalar) by the biorthogonality relations 

wIv = w T v n =0 for all v € A' n .-i(r 0 , ,4), w € K nj -i( 5 o, A T ), 

n j n ) J [Z. li) 

j = 1,..., J. 


The look-ahead procedure we have sketched so far only skips over exact breakdowns. 
It yields what is called the nongeneric Lanczos algorithm in [Gut]. Of course, in finite 
precision arithmetic, a viable look-ahead Lanczos algorithm also needs to leap over near- 
breakdowns (2.9). Roughly speaking, a robust implementation should attempt to generate 
only the “well-defined 7 ’ regular vectors. In practice, then, one aims at generating two 
sequences of vectors {y„ Jt }^ =1 and {u> n;jk }it=i> w h ere 


{ n J*}i=l — { n j}j=i’ 


(2-18) 


is a suitable subset of (2.15). We set ji = 1, since tq and uq are always regular. 

Taylor [Tay] and Parlett, Taylor, and Liu [PTL] were the first to propose such a 
practical procedure. However, in [Tay, PTL], the details of an actual implementation are 

worked out only for look-ahead steps of length 2. 

In [FGN, FN1], Freund, Gutknecht, and Nachtigal have proposed an implementation 
of the look-ahead Lanczos method for general complex non-Hermitian matrices. The algo- 
rithm can handle look-ahead steps of any length and is not restricted to steps of length 2. 
On manv modern computer architectures, the computation of inner products of long tec- 
tors is a bottleneck. The algorithm described in [FGN, FNl] has the additional feature 
that it requires the same number of inner products as the classical Lanczos process, as 
opposed to the look-ahead algorithm described in [Tay, PTL], which always requires ad- 
ditional inner products. In particular, our implementation differs from the one in [Tat, 

PTL] even for look-ahead steps of length 2. 

In the next section, we present a sketch of the look-ahead Lanczos algorithm proposed 

in [FGN, FNl] and list some of its basic properties. 


2.3. The look-ahead Lanczos algorithm 


First, we introduce some notation. As in the last section, n = 1,2,... denote the indices 
of the Lanczos vectors v n and w n . FYom now on, we will always normalize the Lanczos 

vectors so that 


= U>7! 


= 1, n = 1,2, . 


(2.19) 
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For simplicity, we set n k : = n Jk for the indices of the “well-defined” regular vectors, cf. 
(2.18). However, notice that there is no guarantee that the indices n k generated by the 
look-ahead Lanczos algorithm in finite precision arithmetic actually satisfy (2. IS). The 
index k = 1,2, . . . is used as a counter for the computed regular Lanczos vectors v nk and 

u» ru- 
in order to obtain complete bases for the subspaces K n (rd,A) and h n (so,A ), we 

need to add vectors 

v n € K„(r 0 ,A) \ A'„_i(r 0 ,.4) and w n € K n (so,A T ) \ K n -\(so, A T ), ^ 

n = rzjt-i + l,...,n* - 1, k = 2,3,..., 

to the two sequences of computed regular vectors v„ k and w„ k , k = 1,2,..., respectively. 
The vectors in (2.20) are called inner vectors. We will refer to both the regular and the 
inner vectors v n and w n generated by the look-ahead variant as right and left Lanczos 
vectors, in analogy to the terminology for the classical nonsymmetric Lanczos Algorithm 
2.1. 

For each fixed n = 1,2, . . ., we denote by / = /(n) number of the last computed 
regular vector with index < n. Then, the first n Lanczos vectors iq, . . . , v n and uq, . . . , w n 
generated by the look-ahead Lanczos process can be grouped into / blocks 

V {k) = [i>m + i — l ] , ^ = [ +i ’ w ^k+i~ i 1 ’ 

k = 1,2,...,/- 1, (2-21) 

V (,) =[t’„, y n| + i ■■■ In], W (,) =[u> n/ Wn, + 1 ••• W’n]. 

In the sequel, we denote by 

h k =n k+ i - n k , k = 1,2,...,/- 1, h, = n - n, 

the number of vectors in each block. Note that the first vectors v„ k and iv nk in each block 
are just the regular vectors. The /th block is called complete if n = ni+i — 1; in this case, 
at the next step n + 1, a new block is started with the regular vectors u n , +1 and w ni + t . 
Otherwise, if n < n; + i — 1, the /th block is incomplete and at the next step, the Lanczos 
vectors u n -i-i an d are added to the /th block as inner vectors. 

So far, we have not specified how to actually construct the inner vectors. The point is 
that the inner vectors can be chosen such that the u n ’s and iv n 's from blocks corresponding 
to different indices k are still biorthogonal to each other. More precisely, in analogy to the 
biorthogonality relation (2.5) for the classical Lanczos algorithm, we have 

(,yU)gV(‘) = {° fl(J0 JjfJ; 2 j. (2.22) 
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We remark that the inner vectors constructed because of an exact breakdown correspond 
to singular or deficient FOP’s, while the inner vectors constructed because of a near- 
breakdown correspond to polynomials which in general are combinations of regular, sin- 
gular, and deficient FOP’s. 

Next, we show that the matrices D (k) in (2.22) are necessarily nonsingular. exc< for 
possibly the Ith. block, i.e., 

D {k] is nonsingular, k = 1, 2, 1, and D (l) is nonsingular if n = n, + 1 - 1. (2.23) 

Indeed, assume that is singular for some k < 1, where, in tue case 

k = /, the 1th block is complete. Then, there exists a vector z such that 

(W (fc) ) T V r(fc) z = 0 and V^z ^ 0. (2.24) 

With (2.22) and (2.24), it follows that v = i'„ t+1 + V (k) z fulfills 

w T v = 0 for all u: € K ni+1 -i(s 0 , A T ). (2.25) 

Using (2.17) and (2.25), we conclude that v = pv nk+1 for some scalar p ^ 0. which is 
impossible. 

With these preliminaries, the basic structure of the look-ahead Lanczos algorithm is 
as follows. 

Algorithm 2.4. (Sketch of the look-ahead Lanczos process.) 

0) Choose r 0 , s 0 G C N with r 0 , so # 0; 

Set t’i = r 0 /||r 0 ||, = Wlkoll; 

Set V = vi , W (1) = w u D = (WW) T VM; 

Set m = 1, / = 1, u 0 = W 0 = 0, V 0 = W 0 = 0, pi = = 1; 

For n = 1,2,... : 

1) Decide whether to construct u n + i and as regular or inner vectors 

and go to 2) or 3), respectively; 

2) (Regular step.) Compute 

v n+1 =Av n -V«\D^)-'( W^fAv n 

w n+1 = A t w„ - W (l \D {l) )- T {V w ) T A T w n 

- wV- l) (D«- l) )- T (V«- 1 ') T A T w n , 
set n/ + i = n + 1, / = / + 1, V (l) = W {l) = 0, and go to 4); 

3) (Inner step.) Compute 

Un-j-l = Av n Cn^n (t]n/ Pn) Vn — 1 

(2 27) 

W n+ 1 = A T W n - ( n w n - (r]n/(in)Wn - 1 

- W < ' , ~ 1) (D < ' l ~ 1) )~ T (V ( ' t ~~ 1) ) T A T w n ; 
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4) Compute p n +\ = ||5 n +i|| and £n+i = ||d> n +i||; 
If p„+i = 0 or £ n+ i = 0: set L = n, and stop; 
Otherwise, set 


l’n + 1 = V n +i/pn+U Wn+l = «Wl/fn+l- 

V(h = [V (D Vn+l ] t wW = [WW u , n+1 ], (2.2S) 

D (,) = {W {l) ) T V (l) . 

If only regular steps 2) are performed, all blocks have size h\ = 1 and Algorithm 2.4 reduces 
to the classical Lanczos process. Therefore, the strategy for the decision in step 1) should 
be such that regular steps axe performed whenever possible and blocks of size hk > 1 are 
built only to avoid exact or near- breakdowns. A practical procedure for the decision in 
step 1) will be discussed in Section 2.4. 

In (2.27), Cn and rj n , n = 0,1,..., are recurrence coefficients with rj nk = 0, k = 1, 2, . . .. 
One may choose these coefficients so that they remain the same from one block to the next 
and change only with respect to their index inside the block, n— n*, or one may choose these 
coefficients so that they change from one block to the next. For instance, one practical 
choice for the basic three- term recursions 

V = Av n — ( n Vn — 7n(vn-l/pn) and W = A T W n — C n u ’*» — r ?n(a’ n - l/£n) 

for generating the inner vectors in (2.27) is Chebyshev iteration [Man], where the recurrence 
coefficients are derived from suitably scaled and translated Chebyshev polynomials. In this 
case, the translation parameters could be adjusted using spectral information obtained 
from previous Lanczos steps. We do not necessarily advocate the use of fancy recursions in 
(2.27). From our experience, the algorithm we propose builds very small blocks, typically 
of size 2 or 3. Except for artificially constructed examples, the largest block we observed 
in test runs with “real-life” matrices was of size 4. It occurred for the SHERMAN5 matrix 
from the Harwell-Boeing set of sparse test matrices [DGL] where out of 1500 steps, the 
algorithm built 2x2 blocks 49 times, 3x3 blocks 7 times, and one 4 x 4 block (see [FN2, 
Example 2]). Hence, the recursion in (2.27) is not overly important, and in our experiments, 
we have used the recursion coefficients Cn = 1 and, if n ^ n*, r] n = 1. Finally, one could 
consider orthogonalizing (in the Euclidean sense) the right respectively left Lanczos vectors 
within each block. However, for the blocks we have seen built, such an orthogonalization 
process did not lead to better numerical properties of the algorithm. Therefore, in view 
of the additional inner products which need to be computed, orthogonalizing within each 
block is not justified. 

Next, we list some basic properties of Algorithm 2.4 which will be used in the sequel. 
First, note that the Lanczos vectors generated by Algorithm 2.4 indeed satisfy the block 
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biorthogonality relations (2.22). The proof is standard, using induction on n. and is 
omitted here. Setting, in analogy to (2.1), 


V n — [ t'l 1’2 

W„ = [u*i U'2 


v n ] =[F (1) V W ... V'(d], 

u>„] = [IF (1) IT r(2) ... IT’ 1 ')] , 


t, 2.20) 


one clearly has 


K„(r,.A)= {V<">r | *-6 C”}, 
A'„(s 0 ,.-l r ) = {w M z | z€C"). 


(2.30) 


Moreover, the recursions for the v’s in (2.26) and (2.27) can be rewritten in matrix formu- 
lation as follows: 

AV n = V n+1 Hl‘\ 


Here, 


where 




H n 

Pn+ie^ J 


H n = 


o 1 02 0 

72 a 2 

o 


0 


0 


01 

0 7/ ai 


is 


an n x n block tridiagonal matrix with blocks of the form 


(2.31) 

(2.32) 


(2.33) 


Ofc = 


* * 0 

Pn t + 1 * 

0 Pn k + 2 


0 


0 Prtk + 1 — 1 


7 k = 


0 pn k 
0 

. . . 0 


. (2.34) 


The blocks 0 k are in general full matrices. Furthermore, for k = 1, ...,/- 1, the matrices 
a k , 0 k , and 7* are of size h k x h k , h k - 1 x ft*, and ft* x A*-i, respectively. The matrices a,, 
and 7 / corresponding to the current block l are of size hi x /i/, 1 x hi, and hi x hi—i, 

respectively. Here hi = /i/ if the /th block is complete. 

In view of (2.33) and (2.34), H ( n e) is an upper Hessenberg matrix with positive subdi- 

agonal elements, and hence 

rank/T<‘> = n. ( 2 - 35 ) 
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In exact arithmetic, the stopping criterion in step 4) of Algorithm 2,4 will be satisfied 
after L+ steps, where L+ is given by (2.3) and (2.4), except in the very special situation of 
an incurable breakdown. Recall from Section 2.2 that an incurable breakdown occurs if. 
and only if, rtj < L* in (2.15). One can show (cf. [Gut]) that, if nj < Algorithm 2.4 
will produce, starting with the regular vectors u ni and w n[ where rii = nj. infinite blocks 
yW and of nonzero Lanczos vectors such that \ s the infinite zero matrix. 

We would like to stress that incurable breakdowns are very rare and do not present 
a problem in practice. Furthermore, even in the case of an incurable breakdown, the 
look-ahead Lanczos process still yields information on the spectrum of A, as Ta\lor [Ta\] 
showed in his Mismatch Theorem (see also [Gut, Par]). For later use, we summarize the 
termination properties of the look-ahead Lanczos process in the following 

Proposition 2.5. There is a termination index L < N such that, in exact arithmetic , 
Algorithm 2.4 will either stop in step n = L with /?l+i = 0 or £l+i = 0, or, starting vith 
the regular vectors vr+i an d j, an incurable breakdown will occur . If pt + i 0 or 
£l+i = 0, then v u ...,v L or u'i....,w L span the A-invariant subspace Kl(v u A) or the 
A t -invariant subspace Kl^sq^A^), respectively. Moreover , in all cases, 

A (H L ) C A (A). (2-36) 


2.4. The look-ahead strategy 

In this section, we discuss the criteria used to decide in step 1) of Algorithm 2.4 whether 
a pair of Lanczos vectors u n +i and w n +\ is built as inner vectors or as regular vectors. 
We propose three criteria, namely (2.40)-(2.42) below. If all three checks (2.40)-(2.42) 
are satisfied, then u n+1 and uwi are constructed as regular vectors, otherwise, they are 
constructed as inner vectors. Let us motivate these three criteria. 

First, recall (cf. (2.23)) that for r„+i and w n +i to be built as regular vectors it is 
necessary that D ^ is nonsingular. Therefore, it is tempting to base the decision regular 
versus inner step” solely on checking whether D (/) is close to singular, and to perform a 
regular step if, and only if, 

(2-37) 

for some suitably chosen tolerance tol. For example, Parlett [Par] suggests tol = e 1/4 or 
tol = e 1 / 3 , where e denotes the roundoff unit. Then (2.37) would guarantee that complete 
blocks of computed Lanczos vectors satisfy 

<Tmin (D^)>tol, k= 1,2,... . 


This, together with (2.22), would imply by [Par, Theorem 10.1] that 

tol 

/ — ~ Him \ - ■ 11 J — r~ 

y/n y n 


o'min(V’„) > — 7 = and cr miIi (W n ) > —j=, n — rik 1, k — 1, 2, . . . . 


(2.38) 
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Since the columns of V n and TV n are unit vectors, <7 m i n (V n ) and <7min(TT n) are a measure 
of the linear independence of these vectors: in particular, (2.38) would ensure that the 
Lanczos vectors remain linearly independent. However, in the outlined algorithm, the 
block orthogonality (2.22) is enforced only among two or three successive blocks, and in 
finite precision arithmetic, biorthogonality of blocks whose indices are far apart is typically 
lost. The theorem assumes that (2.22) holds for all indices, and without this, the theorem 
fails in finite arithmetic. We illustrate this with a simple example. 

Example 2.1. In Figure 2.1, we plot ^min (£)(«"))) (dots), min 1 <jt</(„) (cr min (D {k) )) solid 

line), and sjn <7 m j n (V„) (dotted line), as functions of the iteration index n = 1,2 , for a 

random 50 x 50 dense matrix. The theorem predicts that 

Vn <7 m j n (V n ) 2 min (<7 m j n (£>< fc) )), 

1 <k<l(n) 

which is clearly not the case. 



Figure 2.1. (dots), min^^/fn) (<7min(^^)) (solid line), and 

V^n cr min (Vn) (dotted line), plotted versus the iteration index n. 


As this simple example shows, the check (2.37) alone does not ensure that the corn- 
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puted Lanczos vectors are sufficiently linearly independent. In particular, if the look-ahead 
strategy is based only on criterion (2.37), the algorithm may produce, within a block. Lanc- 
zos vectors which are almost linearly dependent. When this happens, the check (2.37) usu- 
ally fails in all subsequent iterations and thus the algorithm never completes the current 
block, i.e., it has generated an artificial incurable breakdown. 

In addition, numerical experience indicates another problem with (2.37): for values 
of tol which are “reasonably" larger than machine epsilon, the behavior of the algorithm 
is very sensitive with respect to the actual value of tol. We also illustrate this with an 
example. 

Example 2.2. Here we consider the 3-D PDE 

Lu = f on (0,1) x (0,1) x (0,1), (2.39) 


where 




d_ 

dz 


du 


1 


du 

,*y __ 

' dz 

- 25(f) u, 


+ 30(x + y + z)— + 

v Ox V 1 + x + J/ + 2 

with Dirichlet boundary conditions u = 0. The right-hand side / is chosen such that 
U = (1 - x)(l - j/)(l - z) (1 - e- r ) (1 - e-q (1 - e-q 


is the exact solution of (2.39). We discretize (2.39) using centered differences on a unifoim 
15 x 15 x 15 grid with mesh size h = 1/16.- This leads to a linear system (11) "hh 
real nonsymmetric coefficient matrix A of order N = 3375 and 222/5 nonzero elements. 
We applied the QMR Algorithm 3.1 based on the look-ahead Lanczos Algorithm 2.4 to 
this linear system. As initial guess, we used .to = 0. and, in Algorithm 2.4. -'o = v>, ' l - s 

chosen. This example was run on a machine with e ~ 1.3E— 29. In the first case, we set 
tol = e 1/4 as 6.0E— OS, while in the second case, we set tol = e 1 / 3 % 2.3E-10. In Figure 2.2. 
we plot <x min (I> (,(n ») versus the iteration index n for the two runs, the dotted line for e l / ' 1 
and the solid line for e 1/3 . In the first case, the algorithm starts building a block which it 
never closes, and the singular values clearly become smaller and smaller. \et if tol is onh 
slightly smaller, as in the second case, the algorithm runs to completion, in this case solving 
the linear system to the desired accuracy, and thus indicating that the block built in the 
first case was not a true, but an artificial incurable breakdown. Furthermoie, in the second 
case, the QMR approach takes n = 149 steps to reduce the norm of the initial residual 
by a factor of 10 -6 ; see Figure 2.3, where the relative residual norm j|r n || / ||r 0 || is plotted 
versus n (solid line). For the run with tol = e 1/4 « 6.0E-08, the resulting convergence 
curve is shown as the dotted line in Figure 2.3. Notice that, due to the artificial incut able 
breakdown, QMR does not converge in this case. 
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Figure 2.3. Relative residual norm |(r n || / ||ro|| plotted versus n. 

These numerical examples clearly show that the decision "regular versus inner step 
cannot be based on (2.37) alone. Instead, we propose to relax the check (2.3i ). so that 
it merely ensures that 22^”^ is numerically nonsingular, and to add the checks (2.41) 
and (2.42) below which guarantee that the computed Lanczos vectors remain sufficiently 
linearly independent. Hence, instead of (2.37), we check for 

*run(D {,in)) ) > e, (2.40) 


where e denotes the roundoff unit. 

Our numerical experiments have shown that typically the algorithm starts to generate 
Lanczos vectors which are almost linearly dependent, once a regular vector e n + i was 
computed whose component Av n € A' n +i(ro, .4) is dominated by its component in the 
previous Krylov space I\ n (i'o,A) (and similarly for u>„+i). 

In order to avoid the construction of such regular vectors, we check the /j -norm of the 


21 



coefficients for V' (,_1) and V U) in 1 2.26); r n+ , can be computed as a regular vector only if 

Y h(D''- 1) )- , (H /( '" 1) ) r Ae ri ) 7 | < n(.4) 

(2.41) 

and Y, |((P (0 )- 1 (H' (,) ) T -4e n ),| < n(A). 

; = n, 

Here n(A) is a factor depending on the norm of .4; we will indicate later how this factor 
is computed. Similarly, we check the ^-norm of the coefficients for and H in 

(2.26); u>n+i can be computed as a regular vector only if 

Y |((jD (/- ^) _T (T /(,-1) )' r .4 r ie n )j| < n(A) 

j-ru-i (2.42) 

and Y \((D (l) r T (V^fA T w n ) } \ < n(A). 

J = ru 

The pair v n+i and u’ n + i is built as regular vectors only if all the checks in ( 2.40)— (2.42) 
hold true. 

We need to indicate how n(.4) is chosen in (2.41) and (2.42). Numerical experience 
with matrices whose norm is known indicates that setting n(,4) = ||-4|| is too stiict and can 
result in artificial incurable breakdowns. A better setting seems to be n(.4) = 10 • ||.4||. but 
even this is dependent on the matrix. In any case, in practice one does not know ||.4||. and 
there is also the issue of a maximal block size, determined by limits on available storage. 
To solve the problems of estimating the norms and a suitable factor n(.4). as well as cope 
with limited storage and yet allow the algorithm to proceed as far as possible, we propose 
the following procedure. Suppose we are given an initial value for n(.4). based either on 
an estimate from the user (for example, n(A) from a previous run with the matrix .4), oi 
by setting 

n(,4) = max { ||*4ui || , ||A T u-i || } . 

Note that here .4 denotes the matrix actually used in generating the Lanczos vectors, thus 
including the case when we are solving a preconditioned linear system (cf. Section 3.6). 
We then update n{A) dynamically, as follows. In each block, whenever an inner vector is 
built because one of the checks (2.41) or (2.42) is not satisfied, the algorithm keeps tiack of 
the size of the terms that have caused one or more of (2.41)-(2.42) to be false. If the block 
closes naturally, then this information is not needed. If, however, the algorithm is about 
to run out of storage, then n(A) is replaced with the smallest value which has caused an 
inner vector to be built. The updated value of n(A) is guaranteed to pass all the checks in 
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(2.41 ' and (2.42) at least once, and hence the block is guaranteed to close. This also frees 
up the storage that was used by the previous block, thus ensuring that the algotithm can 
proceed. 

2.5. Implementation details 

We now turn to a few implementation details for Algorithm 2.4. In particular, we show 
that our implementation of the look-ahead Lanczos process requires the same number ol 
inner products per step as the classical Lanczos Algorithm 2.1. For a regular step, one 
needs to compute D {1 \ (W^) T Av n , and {W {l ~ l '>) T Av n in (2.26). For an inner step, one 
needs to compute (TF (, - 1) ) r .4u n in (2.27) and to update D {1) in (2.2S). We will show that 
for a block of size hi , only 2 hi inner products are required: 2 hi — 1 will be required to 
compute £> (, \ and one inner product will be required to compute (W {l) ) T Av n . We will 
obtain (W^ i-1 ^)' r At’ n without performing any inner products. Note that a block of size 
hi in Algorithm 2.4 corresponds to hi steps in Algorithm 2.1, which each require 2 inner 
products. In addition, in step 4) of the look-ahead Lanczos algorithm. Euclidean norms 
of 2 vectors of length N need to be computed. However, for a robust implementation of 
the classical Lanczos process it is also advisable to scale the Lanczos vectors v„ and u> n in 
Algorithm 2.1 to have unit length, cf. [Tay, PTL]. 

To simplify the derivations, we will use the “monic” versions 

v n = — v n - 4 > n -i(A)r 0 and w n = —w n = 4> n - 1 (A T )s 0 (2.43) 

<Pn Vn 

of the Lanczos vectors r„ and w n . where $ n _i € II„_i is monic and <?„. € C. B\ 

V r (0 < f )( l ) , . . we denote the matrices defined as in (2.21) and (2.22). with the mpnic 
vectors instead of the original Lanczos vectors. Clearly, all quantities involving the original 
vectors i? n and w„ can be obtained from the corresponding quantities involving i'„ and w n 
simply by scaling. Finally, we remark that, using a similar argument as in (2.44) below, 
one easily verifies that 

(W (,) ) r .4u n = ( V (,) ) T A T w n and {W«-») T Av n = {V (l ~ l) ) T A T w n . 

Therefore, the coefficients (D^)~ T (V^) T A T iv n and ( D ^ J) ) r (V (( 1) ) T .4 r u.’„, which 
occur in the recursions for the left Lanczos vectors in (2.26) or (2.27), can be generated from 
(£) (<) ) -1 (T , F^^) r Au n and (I?^ -1 ^) -1 (W^ -1) ) T .4 i;„, without computing any additional in- 
ner products. 

Consider first D {1) . Using (2.43) and the fact that polynomials in .4 commute, we 
deduce that 

wjv m =sZ4> J - 1 (A)4> m - 1 (A)r 0 = s%$ m -i(A)$j-i (A)r 0 =w^v r (2.44) 
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This shows that the matrix D {,) is symmetric, and hence we only need to compute its 
upper triangle. 

We will now show that once the diagonal and first superdiagonal of D^ 1 have been 
computed by inner products, the remaining upper triangle can be computed by recurrences. 
Let w } and v m be two vectors from the current block. Usmg (2.27) and the fact tha f the 
inner vectors from block / are biorthogonal to the vectors from the previous block, wc. rave 

wji'm — wJ(Av m - 1 — Cm-lf’m-] ~ Vm-lC'm-2) 

— — (.4 Wj f) t ' m - 3 Cm — 1 d’j ^m — 1 — Urn — 2 

= ( t& j + X "1" d" Vj Wj — \ ) 5m — 1 Cm — iWj Vm — 1 "Qm—l w j ^m — 2 

= wj+l -1 + C; &J - 1 + T]jlvJ_^V m — i “ Cm- 1 ^ j 5 m -l — ^ m - 1 ^ j 5 m — 2* 

Thus, wjv m depends only on elements of D (l) from the previous two columns, and hence, 
with the exception of the diagonal and the first superdiagonah can be computed without 
any additional inner products. Note that the recurrences and the biorthogonality used 
in the above derivation are enforced numerically, and so computing wjv m by the above 
recurrence should give the same results - up to roundoff - as computing the inner product 
directly. 

We will now show how to compute {W^ l) ) T Av n with only one additional inner product, 
while (W^) T Av n can be obtained with no additional inner products. Consider wj Av n , 
for Wj a vector from either the current or the previous block. We have 

wjAv n = (A T Wj) T V n = (Wj + l + Cj™j + Vj ™)- 1 
= wJ+\Vn + Cjtijvn + 

For j < n, - 1. {W i ‘- 1) ) T i' n = 0, and hence wjAv n = 0. For j = n, - 1, the above 
reduces to w'j^ l _ l Av n = w^ ( v n , which is computed as part of the first row of D il} . For 
m < j < rii+\, all of the terms needed are available from Finally, for the last vector 

in the current block, j — n/+i — 1, we do not have u , ^' |+l v n . and hence have to compute it 
directly, thus requiring another inner product. 
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3. A quasi-minimal residual method for general non-Hermitian matrices 


We now turn to linear systems (1.1). From now on, it is always assumed that A is nonsin- 
gular. Furthermore, all iterative algorithms considered in the sequel are Krylov subspace 
methods, i.e., their iterates x n , n = 1,2, — satisfy (1.2). where ro G C is any gi\en 
initial guess for the exact solution A~ l b of (1.1). Finally, r n = b — -4x n always denotes the 
residual vector corresponding to the nth iterate x n . 


3.1. The quasi-minimal residual approach 

In this section, we describe the basic idea of the QMR approach for solving general non- 
Hermitian linear systems (1.1). 

We set 

Po = Hj"o||, Vi = r 0 /po- (3- 1 ) 

Let v\,vi,...,v n be the right Lanczos vectors generated by Algorithm 2.4, with the nor- 
malized initial residual tq as one of the two starting vectors. By the first relation in (_.30), 
we have the parametrization 


x 0 + v;~~, 2 € c n , 


(3.2) 


for all possible iterates (1.2). Note that the second starting vector, uq G C iV , is still 
unspecified. Due to the lack of a criterion for the choice of uq, one usually sets uq — t'i in 
practice. 

From (3.1) and (2.31), the residual vectors corresponding to (3.2) satisfy 


= r„ - AV„z = r„ -'vWlffi*’-- = Vn+l (l>A" + " - «<*»--) . 


Next, we introduce an (n + 1) x (n + 1) diagonal weight matrix 

Q n = diag(u;i ,cJ2t • • • ,^n+ih > 0 , j = 1 , . . . , n + 1 , 


(3.3) 


(3.4) 


to serve as a free parameter that can be used to modify the scaling of the problem. With 
it, (3.3) reads 

r„ = V n+ ,n- , fi„( / ,oe < 1 " +1) -^< ,, d 

= K+ift; 1 (in - , with d„ = k>,p 0 eS" +1) . 

Ideally, we would like to choose 2 G C n in (3.5) such that ||r„|| is minimal. However, since 
in general V n +i is not unitary, this would require 0(Nn 2 ) work, which is too expensive. 
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We will instead minimize just the Euclidean norm of the bracketed terms in (3.5), t.e., we 
will choose z = z n € C n as the solution of the least squares problem 


d n -Q n H { n e) : n 

= min 

d n -n n H ( n e) z\ 


*eC n 



(3.6) 


By (2.35) and (3.4), H { n e] and Q n H { n e) are (n + l)xn matrices with full column rank n. This 
guarantees that the solution z n of (3.6) is unique and hence, via (3.2), defines a unique nth 
iterate x n . In view of the minimization property (3.6), we refer to this iteration scheme as 
the quasi-minimal residual (QMR) method. Clearly, the QMR iterates still depend on the 
choice of the weights u)j in (3.4). In our numerical experiments, the simplest scaling 


tOj — 1, j — 1,2,..., 


(3.7) 


gave satisfactory results. Recall from (2.19) that all the columns of 1 a+i are unit vec- 
tors. Hence, the scaling (3.7) ensures that all basis vectors v } / uij , j = l,...,n + 1, in 
the representation (3.5) of r n have the same Euclidean length; this is a "natural’ require- 
ment. However, better strategies for choosing f) n might be possible, and therefore we have 
formulated the QMR approach with a general scaling matrix fi n . 

For the solution of the least squares problem (3.6), we use the standard approach (see, 
e.g., [GVL, Chapter 6]) based on a QR decomposition of Q„H^: 


si„hP=qH 


Rn 

0 


(3.8) 


Here, Q n is a unitary (n + 1) x (n + 1) matrix, and R n is a nonsingular upper triangular 
n x n matrix. Inserting (3.S) in (3.6) yields 


min 

*eC n 




mm 

r 6 C n 



Qndn 



min 

:6C" 


Qnd-n 



:i 

i 


Hence, z n is given by 


z n — 



' t l ' 


tn 

where t n = 





T n +1 


- T~ n - 




Qn^n 


(3.9) 


Furthermore, we have 



(3.10) 


We conclude this section by summarizing the basic structure of the QMR algorithm. 
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Algorithm 3.1. (QMR algorithm.) 

0) Choose io G C 1 ' and set ro = b — .4 Jo- P o = i! r o||> r i = r o/po> 

Choose w\ € C‘ v with ||u>x|| = 1; 

For n = 1, 2, . . . : 

jj Perform the nth iteration of the look-ahead Lanczos Algorithm 2.4, 

This yields matrices V n , V n +i, which satisfy (2.31); 

2) Update the QR factorization (3.8) ofQ n H { n e) and the vector t n in (3.9); 

3) Compute 

x n = To + V n R~ l t n \ (3-11) 

4) If x„ has converged: stop. 


3.2. Implementation details 

In this section, we give some of the details for the actual implementation of steps 2), 3), 
and 4) of the QMR Algorithm 3.1. In particular, it is shown that the QMR iterates i„ can 
be computed with short recurrences. This approach for updating the iterates x n is based 
on a technique which was first used by Paige and Saunders [PS] in connection with their 
SYMMLQ and MINRES algorithms for real symmetric matrices. 

First, note that the QR decomposition (3.8) of Q n Hn^ can be computed by means 
of n Givens rotations, taking advantage of the fact that Q n Hh is an upper Hessenberg 
matrix. Hence, the unitary factor in (3.8) is of the form 


Qn — G 


0 

7 

c 

1 


'Gh 0 1 

0 

J 


O 


(3.12) 


where, for j = 1, 2, .... n, 


G,= 


I,-i 0 0 


C } Sj 


0 -sj Cj J 


, with c ; € R. Sj € C. cj 4- |sj| 2 — 1. 


(3.13) 


Recall that, in view of (2.33) and (2.32), £l n Hh ' is block tridiagonal. Therefore, the uppei 
triangular factor in (3.8) is of the form 

8\ e 2 83 0 ■ • • 0 

0 82 €3 ' • ' • : 

: • . 83 ' ■ ' ■ 0 

: e, 


Rn = 


0 


0 8 , 


(3.14) 
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where the blocks 6 k and e k are of the same size as the blocks a k and J k , respectively, 

in (2.33). Moreover, the diagonal blocks S k are nonsingular upper triangular matrices. 

Clearly, a QR decomposition based on unitary matrices (3.12) limits fill-in to the row- 
above each block 3 k in (2.33). Hence each of the blocks 6 k in f 3.14 ) has possible nonzero 
entries only in its last row. 

Next, we note that the decomposition (3.8) is easily updated from the factorization 
of n n - l H ( n e l i of the previous step n - 1. Indeed, to obtain one only needs to compute 
its last column, 

Ul Hn ] r = Rn^nK (315) 

and append it to R n — i* This is done by first multiplying the last column of Q n Hn by 
the previous Givens rotations; by (2.33), this last column has zero entries in positions 
1,2, .. . ,nc;, where 

{ max (ni-i - 1, 1) if v n is an inner vector, 

max (n/_ 2 -1,1) if is a regular vector. 

Therefore, only the Givens rotations with indices no, no + 1, . • • , n — 1 have to be applied, 
and, by setting 


v 


roi 


Q 

a 

1 

o 

i 


C5 

a 

1 

Is} 

o 

i 


0 1 

i 

o 

►—* 

1 


0 1 


7 

o 

i 





(3.16) 


we obtain the desired vector (3.15) up to its last component /i„. It remains to multiply 
(3.16) by a suitably chosen Givens rotation G n which zeros out the last element v = 
(^’n+i/>n+i- To achieve this, set 

c„ = =, 5^ = c„ if V / 0. 

y/M 1 + M 2 " ( 3 - 17 > 

c„ = 0, = 1, if /i = 0. 

and finally one gets /j. n = c„/i + s n u. For later use, we notice that 

\s n H„\ = U> n + ip n + i, (3.18) 


which is readily verified using (3.17). The vector t n in (3.9) is updated by setting 

, _ s-t ^n-l 

L n — n q 
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Clearly, t n differs from t n -i only in its last two entries which are given by 

T n = c n f n and f„ + i = -s^ f n . (3.19) 

Next, we turn to the computation of the QMR iterates j n in (3.11). We define vectors 
Pj via 

P„ = [pi P2 ... Pn]=V„R~'. (3.20) 

Then, with (3.11) and (3.9), it follows that 

Xn, — — i Priori' 

It remains to show how to compute p n . In analogy to the partitioning of V n in (2.21) and 
(2.29), we group the columns of P„ into blocks 

P n = [pW PW ... p(0], (3.21) 

With (3.20), (3.14), and (3.21), one obtains the relation 

P U) = (y<0 - P°- l) tt - P ( '- 2) 0,) (3.22) 

and thus p n can be updated via short recurrences. 

Finally, for step 4) of Algorithm 3.1, a convergence criterion is needed. We stop the 

QMR iteration as soon as 

||r„|| < tol • ||r 0 |I; (3-23) 

here tol is a suitable tolerance, e.g., tol = 10~ 6 . In the QMR algorithm described so far. 
neither the residual vectors r n nor their norms ||r„|| are generated explicitly. However, in 
part a) of the next proposition, we derive an upper bound for ]|r n || which is available at 
no extra cost. In our implementation, the convergence criterion is checked for this upper- 
bound, (3.24), rather than ||r n ||. Once this test is satisfied, we switch over to checking 
(3.23) for the true residual norm ||r n ||. Typically, this is necessary only in the last one or 
two iterations, since (3.24) is a good upper bound for ||r n ||. 

The residual vector itself can be easily updated at the expense of one additional 
SAXPY per iteration, based on the recursion given in part b) of the following 

Proposition 3.2. For n = 1,2,... : 

a) 

||r n || < ||r 0 || Vn + 1 |sis a ■ ■ • . max (wj/wj); 


(3.24) 



(3.26) 


Proof. By taking norms in (3.5) and with (3.10), we obtain 

II r n J| < \\V n +i\\ • \\n~ l \\ ■ l^n+l|- 
Now, from (2.19) and (2.29), U n+1 has n + 1 columns of Euclidean norm 1, and this implies 



\\V n+ i\\ < yn + 1. 

-27) 

Furthermore, by (3.4), 

lien’ll < . max (1M). 

) = 1 ,..., 71+1 

(3. 28) 

Finally, by (3.19), 

l^n+l 1 = |n| * |^l*S2 " ^n — 1 | 1 

(3.29) 

where, in view of (3.9), (3.5), and (3.1), 



II 

— 

o 

5 

(3.30) 


and by combining (3.26-3.30), one obtains the inequality (3.24). 

Now we turn to part b). By inserting z = z„ from (3.9) in (3.5) and using (3.8), one 

obtains 

r„ = fn + lZ/n+l, (3.31) 


where 


ro 


-i nH 


y n+l = v n+1 n~ l Q 


0 

1 


From (3.12), one readily verifies that two successive vectors y„+ 1 and y n in (3.31) are 
connected by 

y n +i = -s„y n + — — v„+i. (3.32) 

w„+i 

Finally, by inserting (3.32) in (3.31) and using the second relation in (3.19), we arrive at 
(3.25). □ 


3.3. The connection between QMR and BCG 

In this section, we are concerned with the connection between QMR and BCG. In partic- 
ular, it is shown that BCG iterates can be easily recovered from the QMR process. 

In the BCG approach, one aims at computing iterates x n which are characterized by 

the Galerkin type condition 

w T (b - Ax n ) = 0 for all w € K n (s 0 , A T ), x n € x 0 + K n {r 0 ,A). (3.33) 

(see, e.y., [Saal]). Here, s 0 6 C N is any nonzero vector. Usually, one sets s 0 = r o- In the 
classical BCG algorithm [Lan2, Fie, Jac], the iterates (3.33) are generated as follows. 
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Algorithm 3.3. (BCG algorithm.) 

0) Choose x 0 6 C' V and set q 0 = r 0 = b - Ax 0 ; 

Choose s 0 € C N , s 0 0, and set q 0 = f 0 = s 0 ; 

For n = 1,2,... 

1 ) Compute 6„ = r„_! r n _i /q^-i T<?n-i and set x n = x n -i + ^n?n-i ■ 

Set r n = r„_! - 8 n Aq n - \ and r„ = r n _i - <5„A q n -i ; 

2) Compute p n = f^r n /f^_ l r n -i ; 

Set q n = r n + p n q n - 1 and q n = r n + p n q n -i; 

3) If r n = 0 or r„ = 0 , stop. 

BCG is closely related to the classical nonsymmetric Lanczos algorithm. Indeed (see, 
e.g., [Saal]), for n = 1,2, . . ., 

r n _ i = <f> n v n i </>n £ C, <f> ^ 0, and r„_i = ip n w n , rpn € C, j/> 0, (3.34) 

where t>„ and u> n denote the vectors generated by the classical Lanczos Algorithm 2.1 with 
starting vectors 

ro and sq . (3.35) 


Unfortunately, like the Lanczos algorithm, BCG is also susceptible to breakdowns and 
numerical instabilities. Obviously, Algorithm 3.3 breaks down prematurely, if 


q'n — i Aq n — i — 0, r n _i ^ 0, r„_i 7 ^ 0, 


(3.36) 


or 


r^-Fn-i = 0, f n -\ 0, r„_i 7^ 0, 


(3.37) 


occurs. We will refer to (3.36) and (3.37) as breakdown of the first and second kind. 
respectively. In general, Galerkin iterates (3.33) need not exist for every n. This is the 
cause of the breakdown of the first kind. Indeed, one can show that (3.36) occurs if no 
BCG iterate x n exists. Breakdowns of the second kind have a different cause: by (3.34), 
(3.37) is equivalent to a serious breakdown in the classical nonsymmetric Lanczos process. 

Next, we rewrite the Galerkin condition (3.33) in terms of the look-ahead Lanczos 
Algorithm 2.4, started with the initial vectors (3.35). This yields a formulation of the 
BCG approach for which breakdowns of the second kind, except for ones caused by an 
incurable breakdown in the look-ahead Lanczos process, cannot occur. In analogy to (3.2), 
we use the parametrization 


X ji — Xq 4 Vn^ni 


«n € 


C", 


(3.38) 


for the BCG iterates. Then, by ( 2 . 31 ), the corresponding residual vector satisfies 

r n = b-Ax n = V n (f n -H n u n )-(u n ) n v „+i, with f n = Poe[ n) . (3-39) 
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By inserting (3.39) in (3.33) and using (2.30), it follows that the iterate (3.3S) satisfies 
(3.33) if, and only if. 

WJ^V n (f n -H n u n ) = (u n ) n Wjv n+1 . (3.40) 

To simplify the discussion of (3.40), we will attempt to recover the BCG iterate only when 
the current block l = l(n) in Algorithm 2.4 is complete. Therefore, in the sequel, it is 
always assumed that n = — 1. This ensures that, in view of (2.22) and (2.23), the 

linear system (3.40) reduces to 

H n u n = /„, (3-41) 

from which we can now derive a simple criterion for the existence of the nth BCG iterate. 

Proposition 3.4. Let n — n^i — 1, / = 0, 1, . . .. Then , the following three conditions are 
equivalent: 

(i) the BCG iterate x defined by (3.33) exists; 

(ii) H n is nonsingular: 

(iii) c n / 0. 

Moreover, if exists, then 


BCG _ OMR 
n ~ x n 


+ 



(3.42) 


r.BOC 

r n 


||r 0 || • l^i $2 ■ ■ ■ •Sn-l'Snl 


U 1 

^'n + l Cn 


(3.43) 


Proof. Clearly, an nth BCG iterate exists iff the linear system (3.41) has a solution. From 
(3.39), (2.33), and (2.34), the extended coefficient matrix [/„ H n ] of (3.41) is an upper 
triangular matrix whose diagonal elements are all nonzero, and thus it has full row rank 
n. Consequently, (3.41) has’a solution iff H n is nonsingular. This shows the equivalence 
of (i) and (ii). 

Next, using (3.S). (2.32), and (3.12), one readily verifies that 


Q — n 


fix — l 0 

0 c n 


R n - 


(3.44) 


This relation implies that (ii) and (iii) are equivalent. 

Now assume c n 7^ 0. From (3.41) and (3.44) it follows that 


Hn R n 


-1 


In - 1 0 

0 1 /c n 


Q n — 1 fl n — 1 f ix - 


(3.45) 


Recalling the definitions of d„ and f n in (3.5) and (3.39), and using (3.9), we can rewrite 
(3.45) as follows: 


u n = z n + R n 


-1 


[ Tn/c n — T n J 


(3.46) 
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By comparing (3.3S) and (3.46) with (3.2) and (3.9), and by using (3.20), we obtain the 
relation 


: BOC = X QMR + ^ _ r „ J pn 


which, by (3.19). is just (3.42). By inserting (3.41) in (3.39). it follows that 


r n — (Un)n^n + 1' 


(3.47 


From (3.47). (3.46), and (3.9), we obtain 


.BCG 



where /j. n 


(•^n)n,n 


(3.48) 


In view of (3.18), 

I|wn+i|| = Pn+I = (3.49) 

k’n + I 

Then, by inserting (3.49), (3.29), and (3.30) in (3.48), we get (3.43), and this concludes 
the proof. 

Proposition 3.4 shows that existing BCG iterates can be recovered easily from the 
QMR process. By (3.43), ||r^ CG || can be computed at no extra cost from quantities which 
are generated in the QMR Algorithm 3.1 anyway. In particular, one may monitor ||r^ CG || 
during the course of the QMR iteration, and compute x via (3.42) whenever the actual 
BCG iterate is desired. 

Finally, we remark that CGS [Son] and Bi-CGSTAB [Van] are modificat ions of the 
BCG Algorithm 3.3. In many cases, these algorithms have better convergence properties 
than BCG. However, neither CGS nor Bi-CGSTAB addresses the problem of breakdowns. 
Indeed, one can show that, in exact arithmetic, CGS as well as Bi-CGSTAB break down 
every time BCG does. 


3.4. A convergence theorem 

In this section, we derive bounds for the QMR residuals which are essentially the same as 
the standard bounds for GMRES. To the best of the author's knowledge, this is the first 
convergence result for a BCG-like algorithm for general non-Hermitian matrices. 

Let L denote the termination index of the look-ahead Lanczos Algorithm 2.4, as 
introduced in Proposition 2.5. We remark that, in exact arithmetic, the QMR Algorithm 
3.1 will also terminate in step n = L. For a diagonalizable matrix A/, we denote by 


k(M) = 


X: 


mm 

X ~ 1 MX diagonal 




the condition number for the eigenvalue problem of M (see, e.g., [BBG, p.46]). 
The main result of this section can then be formulated as follows. 
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Theorem 3.5. Suppose that the L x L matrix H L generated by L steps of the look-ahead 
Lanczos Algorithm 2.4 is diagonalizable, and set 

H = Sl L -xH L nil v 13 50) 

Then, for n = 1,2, ...,£ — 1, the residual vectors of the QMR Algorithm 3. 1 satisfy 

||r„|| < |[r 0 |Kff) y/n + 1 £„ _max (wj/o/j), (3.51) 

j — 1 , .. . , n *4" 1 


where 


e n = min max |$(A)|. 
*en„:*(o)=i A€A (A) 


(3.52) 


Moreover, if Algorithm 2.4 terminates with pi+i = 0> then xi = A 1 b is the exact solution 
of Ax = b. 

Proof. Using (3.26-3.28), (3.10), (3. 5-3. 6), and (3.1), one readily verifies that 


|r n || < ||r 0 || Vn + 1 i9 n max fa /vj), 

; = .,n+l 


where i9 n is given by 


t9 n = min 
--eC n 


e[ n+l) -fl n H { n e) z 

Therefore, for the proof of (3.51), it remains to show that 

l?n < k (H) e n- 

In the following, let n e {1, 2, . . . , L - 1} be arbitrary, but fixed. By 

Hi = 

and (3.50), we have 


(3.53) 


(3.54) 


0 * 


H 


z 


'QnH^Q-^z' 

0 


0 


for all z E C". 


(3.55) 


Recall that Hi, and therefore also H, is an upper Hessenberg matrix with nonzero subdi- 
agonal elements. This implies that 


2 € C n | = | 


(3.56) 
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Using (3.55-3.56), we can rewrite (3.53) as follows: 




min 

*-eC n 


(L) 


- H 



min 

4-6n„:4-(0) = l 


$(H)e 


(L) 

1 


(3.57) 


Hi is assumed diagonalizable, so, by (3.50), H is also diagonalizable, and by expanding 
into any set of eigenvectors of H , we deduce from (3.5/) that 


i9 n < k(H) min max 
“ <J>€n n :$(0) = l ASA (H) 




(3.5S) 


By (3.50) and (2.36), we have A (H) = X(Hi) C A(A), and thus (3.58) is equivalent to the 
desired inequality (3.54). 

Finally, we need to show that xi — .4 -1 6, if Algorithm 2.4 terminates with pl+\ — 0. 
For n = L and pi+\ — 0, the least squares problem (3.6) reduces to a linear system 
with coefficient matrix Qi-iHi. Since .4 is nonsingular, by (2.36), this linear system is 
nonsingular, and hence it can be solved exactly. Therefore, ri - 0 and this concludes the 
proof. □ 

Recall (cf. Proposition 2.5) that, in exact arithmetic, it can also happen that the QMR 
algorithm terminates with Pl+i ^ 0. In this case, one restarts the QMR method, using 
the last available QMR iterate as the new initial guess. Theorem 3.5 shows that xl-\ 
is a good choice. However, the finite termination property of the look-ahead Lanczos 
Algorithm 2.4 is usually lost in finite precision arithmetic. In particular, situations where 
the QMR algorithm needs to be restarted are very rare in practice. 

We remark that for the “natural” scaling uj = 1, the bound (3.51) simplifies some- 
what . 

Next, we contrast the bounds (3.51) for QMR with the standard bounds [SS2] foi 
GMRES. Assume that A is a diagonalizable matrix. Then, the residuals r ^ !RES generated 
by the GMRES algorithm (without restarts) satisfy 


||rf^||<||ro|K^n, n = l,2,... , 

where, as before, £ n is given by (3.52). Hence, up to the slow growing factor \/n + 1 in 
(3.51) and different constants, the error bounds for QMR and GMRES are essentially the 

same. 

In general, simple upper bounds for (3.52) are known only for special cases. For 
example, assume that the eigenvalues are contained in an ellipse in the complex plane 
which does not contain the origin: 


A (A) cf, 0 £ £. 
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Let /i ^ f 2 denote the two foci of £. The ellipse can be represented in the form 


£ = { A £ C 


IW.l + IA — h\< - 2 AI (>• + ; 


with r > 1. 


Moreover, let R be the unique solution of 


i ( P , 1 \ l/il + i/2 1 R ^ 1 

H* + 5>)“TATAP R>1 ' 


The linear transformation 


2 = 2 (A) = 


2A - f x - h 

h-h 


maps £ onto the ellipse 


£r = <Z £ C 


|z - 1| + \z + 1| < r + - 


(3.59) 


and the origin 0 in the A-plane onto a point a £ Q£r on the boundary of Sr in the 2 - plane. 
Here, £r is the ellipse defined as in (3.59), with r replaced by R. Clearly, 0 & £ implies 
R > r. Then, by applying Theorem 3.6 below, we obtain the following upper bound for 
(3.52): 

r n + l/r n 


< 


R n + 1 /R n 

Theorem 3.6. Let r > 1. a 6 0£r, R > r. Then, 


, n = 1,2,... 


min 

<i>en n :<i-(o)=i 


max |$(z)| < 

Z££r 


r n + l/r n 
R n + 1/R n 


n 


1 9 
1 


(3.60) 


The upper bound (3.60) is due to Fischer and Freund [FF, Theorem 2]. Furthermore, in 
[FF] it is shown that equality holds in (3.60), if r > 1 and R is not “too close to r. 


3.5. QMR for shifted matrices 

In this section, we are concerned with situations where .4 is given as a shifted matrix of 
the form 

.4 = M + crl, M £ C"*", o £ C. (3.61) 

Obviously, one has 

A'„(r 0 ,.4) = K n (r 0 ,M) and A'„(s 0 ,>l T ) = A'„(s 0 ,M T ), n = 1,2,..., (3.62) 

and it is easily verified that the look-ahead Lanczos Algorithm 2.4 applied to A or M indeed 
generates identical basis vectors for the Krylov subspaces (3.62), provided the recurrence 
coefficients (n in (2.27) are shifted correspondingly. More precisely, we have 
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Proposition 3.7. Let v„ and w n (respectively v„ and w n ) be the Lanczos vectors gener- 
ated by Algorithm 2.4 applied to M (respectively A = M + o I) with recurrence coefficients 
and rj n ( respectively ( n = ( n + <7 and p n = r} n ). Then, the termination index L (cf. 
Proposition 2.5) is the same in both cases, and 

v n = v„ and w n = w n = 1.2, L. 

Furthermore, for n = 1, 2, . . . , L, 

AV„ = -ffi'V) := » ( d + " [ o"] • < 3 63) 

where H ( n e) denotes the upper Hessenberg matrix (2.32) generated by Algorithm 2.4 applied 
to M. 

Now suppose we want to solve, using the QMR. method, m shifted linear s\ stems 

(M + < 7 jI)x U) = b, j = 1,2, . . . , m, (3.64) 

which differ only in the shifts cj. In view of Proposition 3.7. all m runs of the QMR 
Algorithm 3.1 can be based on only one run of the look-ahead Lanczos Algorithm 2.4 
(applied to M). 

A sketch of the resulting QMR process for solving (3.64) is as follows. 

Algorithm 3.8. (QMR algorithm for solving m shifted systems (3.64).) 

0) For j = 1, 2, . . . , m, set = 0 and = b; 

Set p 0 = ||6||, t'i = b/p 0 ; 

Choose w i £ C N with ||tri || = 1; 

For n = 1, 2, . . . : 

1) Perform the nth iteration of the look-ahead Lanczos Algorithm 2.4 applied to j\/. 
This yields matrices V n , Vn+i, fin * which satisfy M\' n = Vn+iHn ■ 

2) For all j = 1, 2, . . . , m for which ii ; ' has not converged yet : 

Update the QR factorization 

of Q„Hn\<Tj) and the vector t 1 /’ (cf. (3.9)); 

Compute 

xfp = X ( 0 j) + Vnil&r'tiPi 

3) If all in* have converged: stop. 

Finally, we recall (cf. (1.9)) that, for typical application which lead to shifted systems 
(3.64), the matrix M and the right-hand side b axe real, and only the shifts oj in (3.64) 
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are in general complex. Obviously, the Lanczos vectors generated within Algorithm 3.S 
are all real then, as long as one chooses uq € R' V and. in (2.27) real recurrence coefficients 
and rj n . Therefore, even in the case of complex shifts, no complex quantities occur in 

step 1) of Algorithm 3.8. 


3.6. Preconditioned QMR 

As for other conjugate gradient type methods, for solving realistic problems, it is crucial to 
combine the QMR algorithm with an efficient preconditioning technique. In this section, 
we show how to incorporate preconditioners into the QMR algorithm. 

Let A/ be a given nonsingular N x N matrix which approximates in some sense the 
coefficient matrix .4 of the linear system (1.1), Ax = b. Moreover, assume that M is 
decomposed in the form 

M = M\Mi- (3-65) 

Instead of solving the original system (1.1), we apply the QMR algorithm to the equivalent 
linear system 

A't / = b' , where A! = A/j -1 AA/ 2 1 , b' = A/j l (b — Axo), y = A/ 2 (x — xo). (3.66) 

Here x 0 denotes some initial guess for the solution of Ax = b. The iterates y n and residual 
vectors r' n = b' — A'y n for the preconditioned system (3.66) are transformed back into the 
corresponding quantities for the original system by setting 

x„ = x 0 + A/ 2 -1 y n and r„ = A/jr'„. (3.6 < ) 


For the special cases A/j = I or A/ 2 = / in (3.65) one obtains right or left preconditioning, 
respectively. 

Using (3.67), the QMR Algorithm 3.1 combined with preconditioning can be sketched 
as follows. 


Algorithm 3.9. (QMR approach with preconditioning.) 

0) Choose x 0 E C N and set r' 0 = A/j -1 (6 — Axo), po = || r ol!> = r o/po> Vo = 

Choose uq € C N with ||uq|| = 1; 

For n = 1 , 2, . . . : 

1) Perform the nth iteration of the look-ahead Lanczos Algorithm 2.4 (applied 

l ° A ') ; . (e) 

This yields matrices V n , V r ,,+i, which satisfy A'V n = V n +i H n ; 

2) Update the QR factorization (3.8) ofQ n H { n e) and the vector t n in (3.9); 

3) Compute y n = V n R~ l t n ; 

4) If y n has converged: compute x n = xq + A/ 2 1 j/n, aJid stop. 
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In the case of right or left preconditioning, Algorithm 3.9 simplifies somewhat. In 
general, however, for the QMR algorithm applied to a preconditioned system, one has to 
be able to compute M~ l z, M^ T z, M^'z, and M{ r z, for arbitrary vectors s. 
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4. Lanczos methods for complex symmetric matrices 


In this chapter, we consider the QMR method and related algorithms for the special case 
of complex symmetric matrices. Throughout this chapter, it is assumed that .4 = A T . 

4.1. The Lanczos recursion for complex symmetric matrices 

As already pointed out by Lanczos [Lan4, p. 176], work and storage of the classical 
Lanczos Algorithm 2.1 can be halved if .4 is Hermitian respectively complex symmetric, 
by choosing starting vectors so = ro respectively so = r<j. The resulting Hermitian Lanczos 
method has been studied extensively (see [GVL, Chapter 9] and the references therein). 
In contrast, the literature on the complex symmetric variant is scarce and restricted to the 
application of the algorithm to computing eigenvalues of complex symmetric matrices (see 
Moro and Freed [MF] and Cullum and Willoughby [CW, Chapter 6]). Here, we hope to 
convince the reader that the complex symmetric Lanczos algorithm, especially combined 
with look-ahead, is also very useful for solving linear systems. 

Obviously, if one chooses So = r o and sets 7 „ = /?„ in Algorithm 2.1, then all left 
and right Lanczos vectors coincide, i. e., v n = w„. Hence, Algorithm 2.1 reduces to the 
following procedure. 

Algorithm 4.1. (Classical Lanczos method for .4 = A T .) 

0) Choose r 0 G with r 0 ^ 0; 

Set v i = r 0 , vo = 0; 

For n = 1,2,... : 

1) Compute 3 n - (£„ v n ) ; 

If ft n = 0: set L = fi — 1 and stop; 

2) Otherwise, set v n = v n / /3 n ; 

3) Compute a n = v^Av n ; 

Set f’n+l = AVn CH n V n i3nVn — \. 

For the special case of Algorithm 4.1, the properties (2.5) and (2.6) in Proposition 2.2 

reduce to: 

T 

v k v n = 

and 

A' n (r 0 , A) = span{t>i,u 2 ,...,fn}, n = 1,2, . . . , L. (4.2) 

Notice that (4.1) and (4.2) just state that the Lanczos vectors v\ t ...,v n form an orthonor- 
mal basis for K n (ro,A) with respect to the (non-Hermitian) inner product 

(x,y):=y T x, x,y€C N . (4-3) 


0, if k / n, 

1, if k = n. 


k, n = 1,2, . . . ,L, 


(4.1) 


40 



We remark that (4.3) is the proper (cf. Craven [Cra]) “inner product" for complex 
symmetric matrices. Unfortunately, it has the defect that there exist vectors v € C* which 
are quasi-null [Cra], he., (v,v) = 0, but v £ 0. Consequently, as in the case of the general 
classical Lanczos Algorithm 2.1. exact and near- breakdowns in the complex symmetric 
Lanczos Algorithm 4.1 cannot be excluded. Indeed, in view of (2.8), an exact breakdown 
occurs if, and only if, one encounters a quasi-null vector v n . 

Therefore, as in the case of general non-Hermitian matrices, in order to obtain a stable 
implementation of the complex symmetric Lanczos process, one needs to use a look-ahead 
variant of the method. Clearly, for complex symmetric A and with identical starting 
vectors r 0 = so, the left and right Lanczos vectors generated by the look-ahead Lanczos 
algorithm coincide. In particular, as in the case of Algorithm 4.1, work and storage for 
the complex symmetric variant is only half of that of the look-ahead Lanczos Algorithm 
2.4 for general non-Hermitian matrices. 

A sketch of the resulting complex symmetric look-ahead Lanczos process is then as 
follows. 

Algorithm 4.2. (Sketch of the look-ahead Lanczos process for A = A T .) 

0) Choose ro £ C w with r 0 0; 

Set t>i = r 0 /||r 0 ||; 

Set VW =vi, DO) =s(V< 1 >) 7 V< 1 ); 

Set ni = 1, / = 1, uo = 0, Vo = 0, pi = 1; 

For n = 1, 2, . . . : 

1) Decide whether to construct u n +i as a regular or an inner vector 

and go to 2) or 3), respectively; 

2) (Regular step.) Compute 

v n +i = Av n - V r(,) (D (,) ) - 1 (U (,) ) r Ai; n 

-V (l ~ 1 >(D (, - 1 >)- 1 (U (, - 1) ) r At; n , 

set n /+ 1 = n + l, / = /+l, = 0, and go to 4); 

3) (Inner step.) Compute 

Un-fl = AVn C n^n (j)nl Pn)v n— 1 

- Av n , 


4) Compute p n +\ = ||u„+i||; 

If pn+i = 0 : stop; 

Otherwise, set 

v„ + ,=vn +t /p»», V'» = [v«) ».+,]. D<'» = (V<‘))WW. 
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We conclude this section with a result which further clarifies the connection of the 
complex symmetric Algorithm 4.1 with the general classical Lanczos Algorithm 2 . 1 . First, 
recall that, unlike Hermitian matrices, complex symmetric matrices do not have any special 
spectral properties. Indeed (see, e.g., [HJ, Theorem 4.4.9]), any complex .V x .V matrix is 
similar to a complex symmetric matrix. This result entails that the classical nonsymn r ric 
Lanczos Algorithm 2.1 differs from the complex symmetric Algorithm 4.1 only i.. the 
additional starting vector s 0 which can be chosen independently of r 0 in Algorithm 2.1. A 
strict statement of this correspondence is given in the following 

Theorem 4.3. Let M be a complex X x N matrix and r 0 € C^, r 0 ^ 0. 
a) There exists a complex symmetric X x N matrix .4 which is similar to M : 

M = XAX~ l where X is nonsingular. (4.4) 


O 

b) Set r 0 = X ~ 1 r 0 and s 0 = X~ T r 0 . Let v n , w n , a n , /?«, 7» respectively v n , d„, be the 
quantities generated by Algorithm 2.1 (applied to M and started with ro, so) respectively 
Algorithm 4.1 (applied to A and started with r Q ). Let L denote the termination index for 
Algorithm 4.1. Then, for n = 1,2 ,... ,L: 




fin~{n ■ 


(4.5) 


Proof. Only part b) remains to be proved. First, by means of (4.4), we rewrite Algorithm 
2.1 in terms of .4, X~ 1 v n , X T w n . By comparing the resulting iteration with Algorithm 4.1 
and using induction on n, one readily verifies (4.5). G 


4.2. A theorem on incurable breakdowns 

As seen in the previous section, complex symmetry of a matrix is not enough to exclude 
breakdowns in the classical Lanczos process. However, it is possible to use the complex 
symmetric structure to derive a criterion for the occurrence of incurable breakdowns. 

In the following, it is assumed that A is diagonalizable. Then (see, e.g., [HJ, The- 
orem 4.4.13]), .4 has a complete set of orthonormal (with respect to (4.3)) eigenvectors. 
In particular, r 0 can be expanded into eigenvectors of A. More precisely, by collecting 
components corresponding to identical eigenvalues, we get 

l, 

r o = 5>/u, (46) 

/=1 ^ 

where pi / 0, Au\ = A /u/, and, if / ^ J, A / ^ A ; , ufuj = 0. 

Here, £+ = Li = L r is just the grade of ro = so with respect to -4, as defined in (2.3) and 
(2.4) 
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Notice that, unless all eigenvalues of .4 are distinct, quasi-null vectors u/ ma\ occur 
in (4.6). In view of the following theorem, this is equivalent to an incurable breakdown. 
Recall from the discussion in Section 2.2 that an incurable breakdown occurs if, and onl\ 
if, nj < I* in (2.15). 

Theorem 4.4. Let A = A T be a diagonalizable N x N matrix and r 0 6 C' . Then, no 
incurable breakdown can occur in Algorithm 4.2 if, and only if, the eigeni ectors in the 
expansion (4.6) of r 0 satisfy 

ufut^O for all /= ( 4 -<) 

Proof. We need to show that (4.7) is equivalent to the existence of a regular FOP of 
degree L* — 1 with respect to the inner product (2.11) (where now so = r o)- By part b) 
of Proposition 2.3, a regular FOP of degree m exists iff the corresponding moment matrix 
M m := (pj+i)j,i=o,...,m-i is nonsingular. By (2.11) and (4.6), we have 

L * 

Pj = rj A ’tq = ^2 p] Xjufui, j = 0,1, (4-S) 

l=i 

Moment matrices are in particular Hankel matrices. By applying Kronecker s Theorem 
on the rank of infinite Hankel matrices [Gan, pp. 204-207] to Moo := o,i,...i it 

follows that 

rank Moo — rank \l m - rankA/^_i = L for all m > L — 1. (4.9) 


where L is the number of poles of the rational function 


OO 

/w = E 


>=o 


N 

zJ+ 1 ' 


Using (4.8) and £°Z 0 A {/z> +l = 1 /(z - A,), one obtains the following expansion of /: 

f(z ) = V — — U - for all |z| > max |Aj|. (4.10) 

/=1 

In particular, by (4.10), L < L. with equality holding iff (4.7) holds true. Hence, in view 
of (4.9), Ml'-i is nonsingular iff (4.7) is fulfilled. This concludes the proof. Q 

As mentioned, (4.7) is guaranteed if A has only simple eigenvalues. Thus we have the 
following 

Corollary 4.5. If A = A T isanNxN matrix with N distinct eigenvalues, then incurable 
breakdowns cemnot occur in the complex symmetric look-ahead Lanczos Algorithm 4.2. 
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4.3. QMR and related algorithms for complex symmetric matrices 

For the QMR approach, one can exploit the complex symmetry of .4 by setting 


s 0 = r 0 (4.11) 

and basing it on the complex symmetric look-ahead Lanczos Algorithm 4.2. We stress 
that, due to the lack of a criterion for the choice of so, one usually sets so = r 0 anyway. A 
sketch of the resulting complex symmetric QMR method is as follows. 

Algorithm 4.6. (QMR algorithm for A = A T .) 

0) Choose x 0 E C N and set r 0 = b - Ax o, po = J|r- 0 j| , Ui = r 0 /po; 

For n = 1, 2 , . . . : 

1) Perform the nth iteration of the complex symmetric look-ahead 
Lanczos Algorithm 4.2. This yields matrices V n , V„ +1 , 

which satisfy AV n = V n +iHn ; 

2) Update the QR factorization (3.8) ofQ n Hn * and the vector t n in (3.9); 

3) Compute x n = x 0 + V n R~ l t n \ 

4) If x n has converged: stop. 

Due to the savings for the complex symmetric Lanczos Algorithm 4.2, work and storage 
requirements for Algorithm 4.6 are also roughly halved, compared to the general QMR 
Algorithm 3.1. In particular, Algorithm 4.6 only requires one matrix- vector product .4 • v 
per iteration, as opposed to the two products A ■ v and AT • w per iteration for the QMR 
approach for complex nonsymmetric matrices. 

Obviously, the complex symmetric QMR Algorithm 4.6 can also be used in conjunction 
with a preconditioner (cf. Section 3.6). Again, work and storage per iteration is roughlv 
halved, provided one chooses a complex symmetric preconditioner M decomposed in the 
form 

M = Mi M 2 where M 2 = Mj (4.12) 

in (3.65). Note that standard techniques, such as incomplete factorization [MvdV] or 
SSOR preconditioning (see, e.g., [FNl]), applied to A = A T generate complex symmetric 
preconditioners which satisfy (4.12). 

Finally, we remark that a simpler variant of the complex symmetric QMR method, 
based on the classical Lanczos Algorithm 4.1 rather than the look-ahead Lanczos Algorithm 
4.2, is discussed in detail in the author’s paper [Fre4]. 

In analogy to the complex symmetric variant, Algorithm 4.1, of the classical Lanczos 
Algorithm 2.1, the general BCG Algorithm 3.3 reduces to a scheme which requires only 
half the work and storage, if the starting vectors are chosen as in (4.11). The resulting 
procedure is as follows. 


44 



Algorithm 4.7. (BCG for .4 = A T .) 

0) Choose xo £ C iV ; 

Set q 0 = r 0 = b — .4xo ; 

For n = 1,2,... ; 

1) Compute S n = r^_ l r n - ] /q^_ l Aq n -i and set x n = x n _i + 8 n q n - it 

Set r n — r n _j 6 n Aq n — \j 

2) Compute p n = r^r„/r^_, r n _! ; 

Set q n = r „ -|- p n q n —\i 

3) If r n = 0: stop. 

However, as for the complex symmetric Lanczos Algorithm 4.1, breakdowns in Algorithm 
4.7 cannot be excluded. Indeed, both kinds of breakdowns described in Section 3.3 can 
occur in the complex symmetric BCG method. 

Closely related to the BCG method for general linear systems (1.1) is the conjugate 
gradients squared algorithm (CGS) due to Sonneveld [Son]. 

Algorithm 4.8. (CGS for general A.) 

0) Choose xq £ C N and so £ C' v , so ^ 0; 

Set po = uo = r 0 = b — .4xo and compute s£r o. 

For n = 1 , 2, . . . ; 

1) Compute a* = Sq r k _ x / Apk-\ and set q k = u*_ x - a k Ap k - X ; 

Set x k =z x k -i + a k (u k -i + qk) and r k = r k - X - a k A(u k - X 4- qk); 

2) Compute (3 k = sjr k /s$r k -i; 

Set u k = r k + 0 k q k and p k = u k + 3 k (q k + 3 k p k - X ); 

3) If r n = 0: stop. 

Notice that, like general BCG, CGS has a second unspecified starting vector so- However, 
unlike BCG, even with the special choice so = r 0 , CGS cannot exploit the complex sym- 
metry of A. In particular, for A = A T , Algorithm 4.8 requires per iteration about twice 
as much work as the QMR and BCG Algorithms 4.6 and 4.7. 

Finally, as a special case of the general connection [Son] between the CGS and BCG 
approaches, we have the following 

Proposition 4.9. Let A = A T , r 0 = = rf 015 , and, in Algorithm 4.8 , s 0 = r 0 . Then, 

for n — 0, 1, . . ., 

r BCG _ $ n (' J 4) rQ and r^° s = (<J> n (A)) 2 r 0 
for some £ n„ with $ n (0) = 1. 
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5. CG-type algorithms and polynomial preconditioning for shifted Hermitian 
matrices 

In this chapter, we consider CG-type methods for the solution of linear systems (1.1) with 
coefficient matrices of the form 

A = T + iol where T = T H is Hermitian, a € R. (5.1) 

Clearly, by multiplication of the right-hand side b or the unknown vector x by e the 
more general case (1.4) can always be reduced to (5.1). Although our main interest is in 
non-Hermitian A , we include the case o = 0 and assume that A = T is nonsingular then. 
This guarantees that A is always nonsingular, and the exact solution of -4x = b is denoted 
by x* = A~*b. Most of the results in this chapter are taken from the author s paper [Fre3] 
on shifted Hermitian matrices. 

5.1. Three CG-type approaches 

We will consider three different CG-type approaches. Recall (cf. Section 1.2) that, for 
shifted Hermitian matrices, it is possible to have an ideal CG-like method with iterates 
characterized by the minimal residual (MR hereafter) property (1.3). The first approach 
we study is the MR method based on (1.3). The second scheme is the GAL method 
which aims at computing approximations x n defined by the Galerkin (GAL hereafter) (or 
orthogonal error [FM2]) condition 

v H (b - ,4x n ) = 0 for all v € K„(r 0 . .4), x n 6 xo + A n ( r o > -4). (5.2) 

Note that, for Hermitian positive definite .4, this method is equivalent to the classical CG 
algorithm (see. e.g., [PS]). While MR and GAL are standard approaches for non-Hermitian 
matrices, the third method we propose is less conventional. Its iterates are defined by the 
minimal Euclidean error (ME) property 

||x. — x n || = min ||x* — x|| , x n 6 x 0 + A n (-4 w r 0 , .4). (5.3) 

11 " x£zo+K n {A H r 0 ,A) 

Note that for the Krylov subspace in (5.3) one has the identity 

K n (A H r 0 ,A) = A H Kn(r 0 ,A), (5-4) 

since matrices (5.1) are normal and thus 

AA h =A h A. (5.5) 
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We remark that MINRES and SYMMLQ [PS] are numerically stable implementations 
of the MR and GAL methods, respectively, for real symmetric matrices A. If .4 is indefinite, 
a Galerkin iterate satisfying (5.2) need not exist for every n. Paige and Saunders resolve 
this problem in SYMMLQ by actually working with a sequence of well-defined auxiliary 
vectors from which the existing Galerkin iterates can then be computed in a stable manner. 
The ME approach (5.3) is a generalization of Fridman’s method [Fri] for real symmetric 
matrices A. However, the algorithm he proposed is numerically unstable (see [Frel, SF] 
for an explanation of the instability and a simple remedy). Fletcher [Fie] showed that the 
sequences of the Fridman iterates and the auxiliary vectors generated by SYMMLQ are 
mathematically equivalent. Therefore, as a by-product, SYMMLQ also yields a numerically 
stable implementation of Fridman’s method. 

We now turn to the derivation of algorithms, modeled after SYMMLQ and MINRES, 
for the actual computation of the iterates defined by (1.3), (5.2), and (5.3). The main 
ingredient is the Hermitian Lanczos algorithm [Lanl] applied to the Hermitian part T of 
(5.1) and with ro as starting vector. 

Algorithm 5.1. (Hermitian Lanczos method.) 

0) Set vi = r 0 , v 0 = 0; 

For n = 1,2,... : 

1) Compute 0 n = || t? n ||; 

If 0 n = 0: set L = n - 1, v L +\ = 0, and stop; 

2) Otherwise, set v n = v n /fin> 

3) Compute a n = v„Tv n ; 

Set 5 n+ i — Tv n — <y n v n - fin^n - 1 • 

Notice that Algorithm 5.1 is just a special case of the classical Lanczos Algorithm 2.1 
(applied to T and with starting vector s 0 = f 0 ). However, unlike the general Algorithm 
2.1, the Hermitian Lanczos process cannot break down prematurely. In exact arithmetic. 
Algorithm 5.1 stops after a finite number of steps with termination index 


L = dim I\N(r 0 , .4) = dim Ji/v(r 0 , T). 


Moreover, with V n defined as in (2.1) and 


T n 


’ Oc i 02 0 ' ' ' 0 

02 '• '• : 


0 

0 


•. o ’ 

••• fin 

0 fin . 


(5.6) 


(5.7) 
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for the Hermitian Lanczos method, the properties (2.5)-(2.7) listed in Proposition 2.2 now 
reduce as follows: 

V n H V n = (5-S) 

TV n = V n T n + [0 0 0 (5.9) 

K n (r 0 ,A) = 7\ n (r 0 ,T) = spanf^, v 2 , ■ ■ ■ (5.10) 

Here and in the sequel, it is always assumed that n G {1,2,...,!}. Note that, with 


H <■>« 


T n + io I n 

/W? J 


(5.11) 


and by adding iaV n to both sides of (5.9), we obtain 

AV n = V n+1 tf£ e) . 


(5.12) 


Next, we rewrite the MR, ME, and GAL conditions in terms of V n and Hh \ In order 
to match the notations used in Chapter 3, we set po = 0i, and thus 


ro=PoVi, Po=Pi = lko||- 


(5.13) 


Proposition 5.2. 

a) x‘l* R = i 0 + Vkz‘k fR where zjf m is the solution of the least squares problem 



b) x™ E = xo + A H V n z^ !E where z„ E is the solution of 

Poe<”» = (*<'> )"*'*>*. 


c) x^^ — xq + V n z„ M ' where z^ 1 is the solution of 

p Q e[ n) = (T n + ioI n )z. 


(5.14) 


(5.15) 


(5.1G) 


Moreover, if a = 0 and T n is singular, then no Galerkin iterate satisfying (5.2) exists. 

_n -MR _ -ME — -GAL _ T 
d) X L - x L - x L — X*. 

Proof. First, note that, by (5.13) and (5.8), 

Vf r 0 = poe[ j \ j = 1,2,. .. ,L + 1. (5-17) 

Using (5.12) and (5.8), we obtain 

V n H A H AV n = (H ( n e) ) H H ( n e) . ( 5 - 18 ) 
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a) From (5.13) and (5.12), r n can be represented as in (3.3). With (3.3) and (5.S), it 
follows that the MR property (1.3) is equivalent to (5.14). 

b) (5.4), (5.10), and (5.5) imply that 

X n = Xq 4“ .4 Vn^n, r n = Tg cl A Vn^m with € C . 

The minimization property (5.3) is equivalent to 

0 = v H .4(x, - x n ) = v H r n for all v € A\,(ro,-4). 

By (5.10), it suffices to consider these equations for v = vj, j = l,...,n, and it follows 
that z„ is the solution of 

V n H r 0 = V n H A H A V n z 

which, by (5.17) (for j = n ) and (5.18), is just the linear system (5.15). 

For c), we similarly obtain that x n = xq 4- V n z n satisfies (5.2) iff z n solves the linear 
system 

Poe, = V«V„^H^z 

whose coefficient matrix, by (5.8) and (5.11), is T n -T iol n . If u = 0 and T n is singular, 
the linear system (5.16) could have a solution only if it was consistent. Using the fact that 
T n -! is nonsingular then and /?„ > 0, one easily verifies that this cannot be the case. 

d) In view of (5.6), Ki = A'£,(r 0 ,.4) is an ^-invariant subspace and. since ?' 0 € I\l, 
we conclude that 

x* - xo = .4 _1 r 0 € A~ y I<i = Kl — A H I\l- 

On the other hand, x» trivially satisfies (1.3), (5.2), and (5.3), and it follows that xi — x, 
for all three methods. Q 

5.2. Practical implementations 

First, consider the MR approach. By comparing (5.14) with (3.6), we conclude that, for 
shifted Hermitian matrices (5.1), the MR and the QMR methods are identical, provided 
one sets = I n +\ in (3.6) and the QMR Algorithm 3.1 is based on the Hermitian Lanczos 
Algorithm 5.1. Therefore, an actual MR algorithm for matrices (5.1) can be obtained as a 
special case of the implementation of the general QMR method described in Sections 3.1 
and 3.2. Here, we present a slight modification of the resulting implementation which will 
help to reduce complex arithmetic. 

Since the Lanczos matrix T n in (5.7) is real symmetric, it follows that 

= Tl + <7 2 /„ + /?’ +1 e„e£ 
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is a real matrix. Consequently, one can choose the unitary matrix Q n in a QR decompo- 
sition 


= QS 


R n 
0 


(5.19; 


of the complex matrix (5.11) such that the upper triangular factor R n is real. Lsing 
standard matrix calculus, one verifies that a factorization (5.19) with real R n can be 
constructed with a unitary matrix Q n of the form 


Qn — G n D n 


Gn - 1 0 
0 1 


D n ~\ ■ • • Di 


G 2 0 
0 In- 2 


d 2 


Gj 0 

0 i 


D x 


(5.20) 


with complex diagonal matrices 

Dj = diag(l, . . . , l,e’ Vi , 1, . . . , 1), ^ 6 R, 

T 

j 

and real Givens rotations 

Ij-i 0 0 

0 Cj Sj 

L o ~ s ) c jj 

Recall that for the QR factorization in Section 3.2 we have used slightly different unitary 
matrices Q n (cf. (3.12) and (3.13)). Also, note that, in contrast to the Lanczos matrix 
venerated bv the look-ahead Lanczos process, is now a scalar tridiagonal matrix. 

Hence, the upper triangular R n in (5.19) is of the form 




with Cj 6 C, Sj E C, c* + s 2 j — l. 


R n = 


^2 

0 s 2 


9, 

*3 


0 


0 


0 


0 

0 

9 n 

6n 


(5.21) 


with scalar entries 6 k , £*, 9 k (cf. (3.14)). Moreover, the factorization is easily updated 
from the one of the previous step n — 1 by simply setting 

Q n = S n _ 2 $ n , e n = Sn-l^n + C n -\C n - 2 0 n COS(^„_i, 

h n = —s n -ic n - 2 i3 n e -|- c n _i(a„ — i<r), S n = |/i n |i 


(5.22) 


i Vn _ f h n /\h n \ if h n ± 0, 
6 0 if h n — 0, 


and 


(5.23) 


Sn = d* ^n+H c n — ^ n/Sni s n — fln+ l/^n- 

Based on the QR decomposition (5.19), one then proceeds as in the derivation of the 
implementation of the QMR method in Section 3.2. We omit the details and only state 

the resulting algorithm. 
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Algorithm 5.3. (MR method for matrices (5.1).) 

0) Choose io € C' N and set v = b — .4xo> u o = Po — P-i = 0, 

/?! = T 1 = INI, Co = c- 1 = 1, So = -5-1 = ‘r’o = O' 

For n = 1, 2, . . . : 

1) If 0 n = 0, stop: r n — i solves Ax = b; 

Otherwise, compute 

2) v n = v/p n , a n = v^Tv n , 

v = Tv n — a n v„ — 3 n v n -\, /? n +i - IMI> 

and then 6 n , e„. 8 n , <p n , c„, s„ using formulas (5.22), (5.23); 

3) Pn = (t’n ^nPn — 1 ^nPn-2 )/^ni 

X n — X n _ 1 4“ T nPn with T n = C n T n t n , 

r„ + 1 = -s n f n e'” . 


We now turn to the ME and GAL methods. First, note that the characterization 
(5.2) of the GAL iterates is just a special case of the Galerkin type condition (3.33) (with 
s 0 = fo). Hence, as a special case of the results in Section 3.3 on the connection between 
QMR and BCG, we can obtain a stable implementation of the GAL approach based on 
the MR Algorithm 5.3. Instead, we now derive an implementation of the ME approach 
and show how the GAL iterates can be recovered from the ME method. 

With (5.19) and by setting 

Y n = [ Vl rn y„]):= A H V n R-\ 


it follows from part b) of Proposition 5.2 that 

x ME _ Jq Y n u n where u n is the solution of fi\e\ = R n u. 

Similarly, using that, by (5.11), 

T ’ r “ 

T n + icrln = (#i e) ) n o 

and with (5.19) and (5.20), one deduces from (5.16) that x exists if, and only if, c„ ^ 0 


and then 


x^ =xo+ Y n u n , Y n := V n Ql_ 1 diag(l, 1, . . . , 1, c‘ V " ), 


where u n is the solution of 

P\t\ = R^diag(l,...,l,c„)u. 

Clearly, u n and u n differ only in their last elements and fj n . Moreover, w ith (5.12), 
(5.19), and (5.20), one easily verifies that Y n is identical to Y n up to its last column y n - 
Hence, we obtain the recursions 

x n E - x n-\ + VnVn and, if c„ ^ 0, z™ = x- + fj n y n (5.24) 

(cf. [PS]). The resulting implementations can be summarized as follows. 
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Algorithm 5.4. (ME/GAL method.) 

0) Choose x 0 6 C V and set x(; ,E = x^ = x 0 , v = b - Ax 0 , v 0 = yo = 0. 

3 1 = ||u||, rjo = — 1. c 0 = c - 1 = 1. 5o = -S-l = r’o = »7-l = 

If 3 1 > 0, set r i = v/3i ; 

For n = 1 , 2 ,... : 

1 ) If 3 n = 0, stop: x^J—\ = Xn'^i = x * solves Ax = b; 

Otherwise, compute 

2) a„ = v^Tv n , 

v = T v n cn n v n — 3 n v n _ i , 3 n + 1 = | |f 1 1 , 

and then 9 n , e„, S n , S n , <Pn, c n , s n using formulas (5.22), (5.23); 

3) y n = e‘ v " (-s n _iy„_i 4- c n -iv n ) 

and, if <5 n ^ 0 and the Galerkin iterate is desired, 
x% AL = + ij n y n with fj n = -(e n 77 n _i + 9 n r) n ^ 2 )/K; 

4) Set v n+1 = v/3„+ 1 , if Pn+i > 0, and u n+1 = 0 otherwise, 

J In = C-nVn H~ + 

X™ E — *J« 4- r? n y n with y n = ~(e n Vn-i + 9 n rj n - 2 )/6 n . 

The finite termination property of the Lanczos algorithm does no longer hold in the 
presence of roundoff error (see, e.y., [GVL, pp. 332]), and the stopping criterion stated in 
Algorithms 5.3 and 5.4 is not useful in practice. Instead, one should terminate the iteration 
as soon as ||r n || is sufficiently reduced. Note that, similar to the real symmetric case 
[PS], ||r„|| can be obtained without computing the vector r n itself by using the following 
identities: 

ll4 ,E ll = \J r tl+\ s l + 1 + nX+2. 

ll r i , ' , ll = l|rolM2 

1 1 r 1 1 = "1" C n — \Tj n c'^ J. 

Finally, consider linear systems Ax = b with coefficient matrices .4 of the more general 

class 

A = T + wD where T = T H is Hermitian, a G R, 

with D a Hermitian positive definite N x N matrix. Then, Ax = b is equivalent to the 
linear system 

A'x' = b' where A' = D~'> 2 AD~'/\ x'=D‘' ! x, b‘ = 

whose coefficient matrix A 1 is now of the form (5.1), so that we can use Algorithm 5.3 
or 5.4 for its solution. Note that one never needs to form A* and 6 explicitly, and it 
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is straightforward to rewrite both Algorithms 5.3 and 5.4 in terms of the original linear 
system Ax = b. We omit the details and only state that the resulting MR, ME. and GAL 
algorithms generate iterates which are characterized by the properties (1.3) (with j| || = 
|| || D -i and K n (D-'ro,D~ l A )), (5.3) (with || || = || || D and K n (D~ ] A H D~ } r Q , D~ l A)). 
and (5.2) (with K n {D~ l r 0 , D~ l A)), respectively. 

5.3. Comparisons with other implementations. Operation counts 

Several authors [JY, Sid, AMS] have proposed algorithms for the computation of the MR 
and GAL iterates (1.3) and (5.2). respectively. However, most of these implementations 
(like Orthomin and Orthores in [JY]) are modeled after variants of the conjugate residual or 
conjugate gradient algorithm for Hermitian positive definite matrices. It is well known [PS, 
Cha, SF] that, for Hermitian indefinite A, these approaches are numerically unstable and 
can even break down. For instance, for the GAL method this occurs whenever a Galerkin 
iterate does not exist (cf. [PS] and part c) of Proposition 5.2.). The same stability problems 
can arise for the non-Hermitian matrices (5.1) if o is small. Hence, all these algorithms 
derived directly from the positive definite case are stable only for matrices (5.1) which 
fulfill additional requirements such as T positive definite or |oj bounded away from 0. 
Note that these two conditions are not satisfied for most of the applications mentioned in 
Section 1.3. 

Here, we consider only implementations which are numerically stable for the general 
class of matrices (5.1). Among the proposed algorithms in the literature merely the Or- 
thodir approach [JZ, AMS] for the computation of the MR iterates has this property. This 
algorithm can be stated as follows. 

Algorithm 5.5. (Orthodir MR implementation.) 

0) Choose Xo € C' v and set sq = r 0 = b — Axo, 
qo = Aso, S-i = <?-i = 0, vq = 0; 

For n ~ 0, 1, . . . : 

1) If q n = 0, stop: x n solves Ax = b; 

Otherwise, compute 

2) A„ = qnr n /\\q n \\ 2 , 

^n-fl = ~t" = r n ^n?nj 

3) Pn = qnTqn/\\q n \\ 2 and, if n > 0, V n - ||9n|| 2 /!kn-l|| 2 , 

•Sn-t-1 = q*i (/^fi d” ^ O' )s n ^n^n — 1 , 

<7n + 1 = Y q n p n q n ^n^n— 1 * 

We remark that q n = As n and that the search directions s„ are up to scalar factors 
identical to the vectors p n in Algorithm 5.3. 

Next, the results of operation counts for Algorithms 5.3, 5.4, and 5.5 are presented 
in Table 5.1. Although we solve complex linear systems, most of the scalars (like a„ and 
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J n in the Lanczos step of Algorithm 5.3 and 5.4) occurring in the computations are real. 
Moreover, on some machines, implementations in real arithmetic are more advantageous. 
Therefore, we compare work and storage in terms of real quantities. Listed are the number 
of matrix-vector products T ■ v, v € R^, the approximate number m of additional real 
multiplications per iteration, and the number s of real vectors (of length N ) to be stored. 
The computation of inner products often constitutes a bottleneck on modern computers. 
For this reason, we also give the number dp of dot products x ■ y, x,y €.R l per iteration. 
Finally, notice that — based on the simple observation stated in Proposition 5.6 below 
— work and storage for the MR and ME/GAL methods can be significantly reduced if 
the Hermitian part T of the matrix (5.1) is real. This case occurs frequently in the cited 
applications, and we included the corresponding operation counts in Table 5.1. 

Proposition 5.6. Let T be real and assume that r 0 = b-Ax 0 (E R^. Then, all the vectors 
v n , n = 1,2, . . ., in Algorithm 5.3 and 5.4 are real. In addition, for the MR method, all 
search directions p„ are real vectors. 

Note that often the right-hand side b is a real vector, and then the standard starting 
guess xo = 0 guarantees that r 0 is real. In the general case b E and if cr 0, the 
condition ro £ R^ can always be fulfilled by choosing the starting vector xq = Xq 4- ix q 
appropriately, e.g ., Xq 2 ^ = 0 and Xq 1 ^ = Im6/<7. However, such a strategy might not be 
desirable, if one already knows a good approximation xq for the exact solution of .4x = b. 



T- v 

m 

dp 

S 


MR Algorithm 5.3 

2 

18N 

4 

12 


ME/GAL Algorithm 5.4 

2 

18N 

4 

10 


Orthodir Algorithm 5.5 

2 

26N 

' 8 

14 


If T and vq are real: 

MR Algorithm 5.3 

1 

9N 

2 

7 


ME/GAL Algorithm 5.4 

1 

13N 

2 

7 


If A = T and r 0 are real: 

MINRES [PS] 

1 

8N 

2 

6 


SYMMLQ [PS] 

1 

8N 

2 

5 



Table 5.1. Work per iteration and storage for the various algorithms. Listed are 
the number of matrix-vector products T • t>, v 6 R^, the approximate number m 
of additional real multiplications, the number of real dot products dp, and the 
number s of real vectors to be stored . 
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To explain the numbers given in Table 5.1, a few more comments are necessary. For 
the ME/GAL algorithm, we have assumed that the Galerkin iterate is, if desired, only 
computed in the very last step of the iteration. Furthermore, in order to reduce the 
computational work, note that, in the MR Algorithm 5.3, one computes the vector 8 n p n 
instead of p n . Similarly, in part 4) of Algorithm 5.4, the vector y n itself is never needed 
and, hence, r] n y n is generated directly. Moreover, using fast Givens rotations ( e.g . [GAL. 
p. 158] ), we compute the rescaled vector f n Vn instead of y n in step 3) of Algorithm 5.4. 
Here, /„ := l/(c n _i cos </?„) for the case that s n -i < c„_i and [ sinv?„| < | cos t p n \ > an d fn 
is defined correspondingly for the remaining cases. Note that then only 4n real multipli- 
cations are needed for updating f n y n from /n-iyn-i and v n . 

We conclude this section with a few further remarks. First, Table 5.1 clearly shows 
that the MR implementation stated in Algorithm 5.3 is less expensive than the Orthodir 
Algorithm 5.5. For real symmetric linear systems, Algorithm 5.3 and 5.4 reduce to MIN- 
RES and SYMMLQ [PS], respectively. Notice that, for the case of complex matrices 
(5.1) with T and ro real, Algorithm 5.3 and 5.4 require only little extra work and storage 
compared to MINRES and SYMMLQ. Finally, consider real linear systems with matrices 

A = I — S where 5 = — S T is real and skewsymmetric, (5.25) 

(or, equivalently, A' = iA = T + i/ with T = — iS — if rewritten in the form (5.1)). 
Concus, Golub [CG], and Widlund [Wid] were the first to propose a Galerkin type method 
for the class of matrices. It can be shown, that their algorithm is equivalent to the Galerkin 
part of Algorithm 5.4 for the special case (5.25). Also, note that, in [Frel. Sto], we have 
investigated an Orthodir type implementation of the ME approach for the class A = I-S. 
The first MR-type algorithm for the family of matrices (5.25) was proposed by Rapoport 
[Rap] (see also [EES, Frel] for different implementations). 

5.4. Error bounds 

In this section, we derive error bounds for the MR and ME methods. Let o < A m j n (T) and 
3 > A max (T) be given bounds for the extreme eigenvalues of T. Therefore, all eigenvalues 
of A are contained in the complex line segment S ■ — [o + + i<y\. For the rest of 

this chapter, we assume that in the Hermitian case a = 0, A = T is positive definite and 
0 < a < (3. This guarantees that 0 ^ S'. 

By the standard technique, using 

K„(r 0 ,A) = {*(A)r 0 | * € n^} (5.26) 

and an expansion of r 0 into orthonormal eigenvectors of A (recall that, by (5.5), .4 is 
normal!), one obtains from (1.3) the estimate 

Ik^H/IMI < min max |1 - A*(A)|. (5-27) 

V€lln-i 
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Similarly, with (5.4), (5.26), we deduce from (5.3) that 
||i* - x 

With the linear transformation 


.ME I 

" n 1 


x. -x 0 || < min max |1 — |A| 2 ^( A)|. 
011 “ *€n„-i xes 


z = z(A) = 


2(ia — A) + 0 + « 
(3-a ’ 


(5.2S) 


(5.29) 


which maps S onto the unit interval [—1, 1], the right-hand side of (5.27) can be rewritten 
in the form 

(E n (a) :=) _ min 


where 


a := 


2icr + j3 + a 
(3 — a 


max |$(z)| 

(5.30) 

* [-Ml- 

(5.31) 


Furthermore, using the identity 

4 |A| 2 = (/? - a) 2 (z( A) - a)(z(A) - a), A G 5, 

(note that z(A) = z(A) for all z G 5) one easily verifies that the upper bound in (5.23) is 
just E^i(a) where 


(4 r) (a) : =) min max |$(*)|, 
4»en„(a) 


(5.32) 


n„(a) := {$ G n n | $(<z) = $(d) = 1 and, if a G R, $'(a) = 0}. 

We now turn our attention to the two approximation problems (5.30) and (5.32). It 
will be convenient to represent a in the form 

a = a(xp) = a(R) cos xp + ib(R) sin 0, R > 1, 0 < v < 2tt, (5.33) 

a(R) := i(/J + i), b(R) := '-(R - | ); 

clearly, this is possible for any a [—1,1]. For fixed R > 1, we set Br = {a — a(b)|0 < 
0 < 27 r} and remark that Br — OSr just describes the boundary of the ellipse Sr (defined 
as in (3.59), with r replaced by R) with foci at ±1 and semi-axes a{R), b(R). 

First, we consider the complex approximation problem (5.30). Its solution is classical 
for the case of real a where T n (z)/T' n (a) is the optimal polynomial. Here, T n denotes the 
nth Chebyshev polynomial which, by means of the Joukowsky map. is given by 


x.M s !<»■ + sr). ^ b” + b 


(5.34) 


2 V ~ ' v n /J 2' v 

For purely imaginary a, the extremal polynomials were found by Freund and Ruscheweyh 
[FR], but for general complex a the solution of (5.30) is not explicitly known. The following 
upper bound for the optimal value of (5.30) will be used in Section 5.5. 
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Theorem 5.7. Let R > 1 and n = 1,2,.... Then, 


1 p ^ ^ 2 
— < E n (a ) < i 


a G Br. 


(5.35) 


Proof. The lower bound follows immediately from an inequality due to S.N. Bernstein ( e.g . 
[Mei, Theorem 74 ]). The upper bound is just the special case r = 1 of Theorem 3.6. 0 

We remark that, for fixed R > 1, the upper bound in (5.35) is optimal, with equality 
holding for the two read points of Br. The optimal lower bound is unknown, but it is 
conjectured to be 2/(R n + R n ~ 2 ) which is just the optimal value of (5.30) for the two 
purely imaginary points of Br (cf. [FR]). 

Next, we study the approximation problem (5.32), and we will show that it is closely 
related to the classical Zolotarev problem 


min max \z n + qnz n 1 — 'l , (z)|, 77 G R, n = 2,3,... . (5.36) 

't'en „_ 2 *€(-i,i) 

It is well known that there always exists a unique best approximation 'Sf n {z\ r )) f° r (5.36) 
and the corresponding polynomials 

Z n (z\rj) = z n + r/nz n ~ l - rj G R, n = 2,3, ..., 


are called Zolotarev polynomials. We refer the reader to [CT] for a detailed study of these 
polynomials. Note that 

Z„( 2 ;T 7 ) = 2 1 -"(l + |r ? |) n T n (^±^j) for M < tan 2 (5.37) 


and for the remaining values of rj there are representations of Z n {z\ rj) in terms of elliptic 
functions. 


Theorem 5.8. Let a = a(p) G Br, R > 1, n = 2, 3, . . . . Then, there exists a unique 
optimal polynomial $„(z; a) for (5.32). If rj) = ]ir/{n - 1) with an integer j ^ Omod n — 1, 

then T ( ' \ 

$„(*;<>) = ani 4 r, («) = 


T„_,(a) 


R n ~ l +1/R 


n — 1 


Otherwise, 


$ n (2;a) 


where rj = 77 (a) is the unique solution of 


Zn(a;rj) 


(5.38) 


lmZ n (a\rj) = 0 ( respectively Z' n (a\Tj) = 0, if a G R), rj G R. (5.39) 
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In particular , if ip satisfies for some integer j ^ Omodn 


rj sin' 


UL 

n 


i J n 

cos rp = cos ■ . )7r 

n a(R) 4- sign rj cos ^ 


with |r;| < tan 2 — , rj G R. 
2 n 


then 


r»(f4nn) 


*»(*;<■) = — krM- and 4 r, (o) = 


with p defined by 


T( Q + T? ) 

n( i + M j 


1, i,_ , m M ML i 

2 P p Q 1 + \v\ a(R) + sign rj cos ’ 


p" + 1 /p n 


(5.40) 


(5.41) 


KRf 


(5.42) 


Proof. Writing $ G II n (a) in the form $( 2 ) = 1 — (2 — a)(z — d)'l '( 2 ), 'P G n n _ 2 , 
one recognizes (5.32) as a linear Chebyshev approximation problem, for which, since a 0 
[-1,1], Haar’s condition is satisfied. Standard results from approximation theory (see, 
e.g., [Mei]) guarantee that there always exists a unique optimal polynomial $ n ( 2 ;a) for 
(5.32). Moreover, because of the symmetry of the problem with respect to the real axis, 
$ n is a real polynomial, and <£„ is characterized by assuming its maximum absolute value 
at at least n points in [—1, 1] with alternating signs. This alternation property implies that 
has degree n — 1 or n. First, consider the case n — 1. Since the scalar multiples of T n -\ 
are the only polynomials of degree n — 1 with an alternating set of length at least n, we 
conclude that <$ n (z;a) = T n -\(z) /T n ~\ (a), and, in view of € IT n (a), this case occurs 
iff T n -i(a) G R and a £ R. With (5.33) and (5.34), one readily verifies that these are just 
the points a - a(ip) with ip = jtt/{n - 1), j ± Omodn - 1. Now we turn to the case 
that $ n is of degree n. Since the optimal polynomials for the Zolotarev problem (5.36) 
are characterized by the same alternating property as $ n , it follows that <£,1 is of the form 

(5.38) with a suitable 77 G R- In order to guarantee G II n (a), ij must be the solution of 

(5.39) . 

Now, let Tj G R, [ 77 1 < tan 2 ^ . With (5.37) and (5.34), we conclude that a satisfies 


(5.39) iff 


(a :=) 


a + Tj 

r+H 


1, In j* 1 , . 

= oO’+t) 005 — + 9 (/>--) sm lT 

2 0 TX 2 0 Tl 


(5.43) 


p ' n l p 

for some p > 1 and some integer j ^ Omodn. By using the representation (5.33) of <2 and 
by equating the real (imaginary) parts of (5.43), one arrives at two real nonlinear equations 
for the unknowns cos xp and p, and a straightforward, but lengthy calculation shows that 
the solutions are given by (5.40) and (5.42). Finally, note that the first identity in (5.41) 
is a consequence of (5.38) and (5.37); the second one follows from E n (n) = l/|^n(d)| and 
(5.43). 0 

For general a, (5.38) and (5.39) lead to rather complicated and not very useful formulas 
for E ( n r) (a) in terms of elliptic integrals. Next, we derive simple bounds for this quantity. 
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Theorem 5.9. Let R > 1 and n = 2, 3, . . . . Then, for a = a(if ) € Br: 


< En\«) < 2 


R n + 1/R n 


^2n-l(-R) + bi(R)f2n-l(ll’) 


(=■ Bl r, (< 


(5.44) 


where 


b,(R) = \(R> - ±-), /,« = 


sin(_;^)/ sin tf 


if sin if 7 ^ 0, 
if if = /jt. 


Both bounds in (5.44) are attained if if = ;7r/n, j ^ Omodn. In addition, the upper 
estimate is sharp for if = j ~/(n — 1), j / 0 mod n — 1. 

Proof. Duffin and Schaeffer [DS] showed that for any real polynomial of degree at most 
n, ($(2r)| < M on [—1,1] implies ^(a)] < A f(R n + l/fl n )/2 for all a € Br. Application 
of this result to $ n (r;a) yields the lower bound in (5.44). In order to obtain the upper 
bound, we consider polynomials $(2) = 7 T n (z) + 5T n -\(z) 6 II n (a) with 7, <5 € R. With 
(5.33) and (5.34), one readily verifies that $ € II„(a) iff 7 and 8 satisfy 


(R n + 1/R n ) cos(nxp) 

( R n 1 4- 1 / R n 1 )cos(n — l)ip 

7 


'2' 





_0_ 


A routine calculation shows that this linear system has a unique solution and that 


max |#(.)| = W+|«| = B< r >(a). 

*€[- 1.11 

Finally, the statements on the sharpness of (5.44) follow from Theorem 5.S. 0 

Note that the bounds in (5.44) are asymptotically optimal, and we have the following 

Corollary 5.10. Let R > 1 and a € Br. Then, 

lim (^'(o)) 1 '" = Km (Bi r| (o))' /n = 4- 

n — o© n— *oo it 

The typical behavior of the optimal values of (5.30) and (5.32) and the bounds stated 
in Theorems 5.7 and 5.9 is illustrated in Figure 5.1. For fixed R = 1.103 . . . and n = 30, 
the four curves 

£.(«) < + 2 < £< r >(«) < a = «(*) € Br, 0 < * < x/2, 

are plotted. Note that E n (a ) = E n (a ) = E n (-a ) (and analogously for ifi r) (a)), and hence 
it suffices to consider only the points a in the first quadrant. 


59 




xpj 'Degree 


Figure 5 . 1 . The optimal values E k (a ) and E* r) (a) of the approximation prob- 
lems (5.30) and (5.32) are shown for the case k = 30 and with a = a( ip) moving 
along the quarter of the ellipse Br , R = 1.103... . The lowest curve is E 3 o{a). 
The other three curves display £ 30 ^( 0 ) and its lower and upper bounds as stated 
in Theorem 5.9. 


The following theorem summarizes our results on error bounds for the MR and ME 
methods. For the special case of matrices A = T + ial with positive definite Hermitian 
part T , we also derive an error bound for the GAL method. 

Theorem 5 . 11 . Let a < A min (T) and /? > A max (T) be given, and assume that 0 < a < P 
if a = 0. Let a be given by (5.31 ), and let R be the unique solution of 


l /r> 1, \[P + <r 2 + Va 2 + <7 2 

2 {R + r )= ’ 


(5.45) 


Then, for n = 1, . . . .' 

a) 

b) 


||6- .4x^11 
||6 - -4i 0 || 


||a:* - io|| 


< E„(a) < Rn + l/Rn . 

(5.46) 

< E[ r l,(a) < B^a). 

(5.47) 
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c) IfT is positive definite, then 


||l* - Jollr -\ 4 < 72 +q 2 (v ^+^) 2 + 


where 



(5.4S ) 


Proof of part c). We set e n = r* — x n and p } = e^T^e n , j = —1,0.1. With (5.1) and 
since r„ = .4e n , one obtains 


e„ r n = + icrpo and ||r„||^_, = p\ + a 2 p -\ . (5.49) 

Now let u 6 i 0 + K n (ro,A) be arbitrary. By (5.2), (u - x n ) H r n = 0, and therefore 

e"r„ = (x. - U ) H r n = - «)) " (T-' 2 r„). (5.50) 

By application of the Cauchy- Schwartz inequality to (5.50) and with (5.49), we arrive at 


p\ + <J 2 pl < III* - U ||j (pi + a p-i). 
Next, recall that, by the Kantorovich inequality ( e.g . [Hou, p. 83]), 


(5.51) 


Vi/J-1 < A* o where s := Q(\/k + -^=)^ 


-1 


(5.52) 


Using (5.52) and the estimate p\/ p-\ > A m j n (T 2 ) = o; 2 , we obtain from (5.51) 


1 + <7 2 /i_i / Pi 


a' +t 7- 


Since u E x 0 + A'„(ro,-4) is arbitrary, |jr+ - u||r in (5.53) can be replaced by 

min Mar* — uNx = min ||$(.4)eo||T- 
u€ro+A' n (ro,/t) < t’6rln • 4>(0)= 1 


(5-53) 


(5.54) 


By expanding e 0 into orthonormal eigenvectors of the normal matrix .4 and with (5.29), 
(5.30), (5.31), and (5.35), we obtain 


•ten 


m $(o)-i - H 60 !! 7, En ( a ^ - H e °H T i?n + Y]~R" 


(5.55) 


Finally, combining (5.53)-(5.55) yields the desired bound (5.48). □ 

We remark that, for the special case of a = 0, (5.48) and (5.46) reduce to the usual 
error bounds (see, e.g ., [Sto] ) for the classical conjugate gradient and conjugate residual 
algorithms. 
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In Theorem 5.11, we excluded the case of Hermitian indefinite matrices .4 = 1. Error 
bounds for this case can be found in Chandra [Cha] for the MR method and m _SF , Frel, 
Szy] for the ME method. 

Finallv, we note that for the GAL method there are no satisfactory error bounds for 
the general class of matrices (5.1). 

5.5. Polynomial preconditioning 

Polynomial preconditioning aims at speeding up the convergence of conjugate gradient 
type methods for the solution of Ax — b by applying them to one of the two equivalent 
linear systems 

T(A)Ax = T(A)b (5.56) 

(left preconditioning), or 

T(A)Ay = 6, x = T(A)y (5.57) 

(right preconditioning). Here T is a suitably chosen polynomial of small degree. For the 
case of Hermitian positive definite .4, Rutishauser [Rut] proposed polynomial precondition- 
ing in the 50’s as a remedy for roundoff in the classical CG algorithm. The revival [JMP] 
of Rutishauser’s method and the general interest in polynomial preconditioning is mainly 
motivated by the attractive features of this technique for vector and parallel computers 
(see [Saa2] for a survey). It is interesting to note that Lanczos seems to have been the 
first to consider polynomial preconditioning. The idea already appeared in his 1953 paper 
[Lan3] which, alas, is never referenced. 

In this section, we study polynomial preconditioning for the class of matrices (5.1) 
A = T A id. Let / > 2 be any fixed integer. We seek a polynomial T £ IT/_i with the 
following two properties: 

(i) the coefficient matrix Y(A)A of (5.56) and (5.57) is again a shifted Hermitian matrix 
of the form (5.1); 

(ii) the convergence of conjugate gradient type methods, applied to the preconditioned 
systems (5.56) or (5.57), is speeded up optimally. 

As in the previous section, let a,f3 € R be given such that 

a < [i < 3 for all eigenvalues p of T, (5.5S) 

and assume that 0 < a < 13 if a = 0. Our criteria for optimal convergence in (ii) will 
be based on (5.58) as the only available information on the spectrum .4 and on the error 
bounds stated in Theorem 5.11. 

First, consider requirement (i). For any T € Hi — i f we can represent Y(.4).4 in the 

form 

T (A)A = (T + z<r/)T(T + id) = 'F(T) + irl, (5.59) 
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with S' € IT; and t € R. Note that T, S', and r are related by 


(fi + ia)Y(fJ. + i<r) = $(n) +it and t := i^{-io). (5.60) 

Since S'(T) is Hermitian if. and only if, S' is a real polynomial, it follows from (5.59) that 
(i) is fulfilled if, and only if, 'I' € nj r) and r 6 R. Therefore, from now on, it is assumed 
that T € II/_i satisfies (5.60) with S' € Il[ r) and r E R. 

Next, we turn to the question of optimal choice of S' and r. A first, very tempting 
strategy is to require r = 0 and to choose S' such that T(A)A = S'(T) is positive definite. 
The preconditioned system (5.56) can then be solved by the standard CG method. Clearly, 
S' (T) « I should approximate the identity matrix as best as possible. Using (5.58) and 
(5.60), we conclude that such an optimal S' is given as the best approximation in 


min max |1 — S'(^i)|. 

'Pen) 0 : 'J'(-i<r) = 0 


(5.61) 


For positive definite matrices A = T, this approach just leads to Rutishauser’s method 
[Rut]. For the non-Hermitian case / 0, (5.61) turns out to be equivalent to the approx- 
imation problem (5.32), and we have the following 

Theorem 5.12. Let a ^ 0 and l > 2. Then, there exists a unique best approximation in 
(5.61 ) given by 


S>*(/i) = 1 - Sq 



+ a - 2p 

j3 — a ’ 


a), 


(3 + or + 2 h7 

j3 — a 


(5.62) 


where S>/(z;a) is the extremal polynomial of (5.32) (for n = l) with optimal value E\ \a) 
(cf. Theorem 5.8). Moreover, the matrix T(A)A = S'(T) is positive definite with eigen- 
values in [1 — £^(a),l + £^(a)], and for the iterates x n of the CG method, applied to 
(5.56), the estimates 


ll J * - ^nlk(T) 2 

||z* — xolk(r) - R n + l/R n ' 


R := 


1 + 




(5.63) 


hold. 

Proof. The linear transformation z(p) = (/? + a — 2p)/(f3 — a) maps [a,/?] onto [—1, 1]. 
Moreover, $(z(p)) = 1 — S'(^) defines a one-to-one correspondence between all S' E Il| r) 
with S'(— icr) = 0 find all real polynomials $ 6 Ilj(a). This shows that (5.61) and (5.32) are 
equivalent (recall that the optimal polynomial for (5.32) is reed), and, hence, S'* is indeed 
the unique best approximation in (5.61). The error bounds (5-63) follow from (5.48) and 
(5.45) (with <7 = 0, a = 1 - El r \a), and f3 = 1 + £, (r) (a))- □ 
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Recall (see Figure 5.1) that for fixed / of moderate size and fixed R, Ej r) {a) strongly 
depends on the position of a on the ellipse Br. In particular, if a is close to the real points 
of the ellipse, E^ r \a) is significantly larger than for the other points of Br. Therefore, 
(5.63) suggests that the polynomial (5.62) will yield a poor preconditioner for matrices .4 
which are nearly Hermitian positive definite. This will be confirmed by numerical results 
presented in Section 7.3. Therefore, in order to obtain a polynomial preconditioner which 
is satisfactory for all a G B r , it is crucial to treat r in (5.59) as a free parameter, and, next, 
we determine optimal choices of p and r for speeding up the MR and ME algorithms. 

First, consider the MR method. For it, right preconditioning (5.57) is the more natural 
choice between (5.56) and (5.57), since residual vectors for (5.57) are also residual vectors 
of the original linear system. Let y„ denote the nth iterate of the MR algorithm applied 
to 7 {A) Ay = b , and set x pp = T (A)y n . Moreover, let x„ be the nth approximation 
generated by the MR method applied to the original system Ax = b. Then, assuming that 
x 0 = x pp , it follows with (5.57) that /\„(T(.4)ro, Y(A).4.) C A' n /(ro, A) and x pp ,x n i G £o + 
K n i(ro,A). Hence, the minimization property (5.3) implies that ||6 — -4.x n /|| < ||6-.4x^ p j|. 


Therefore, in view of (5.46), we conclude that, based on (5.58) as the only information on 
the spectrum of A, the best possible choice of T G IIj_i is one which guarantees the 
estimates 

\\b~Ax pp \ 


< 


1 \b~. 4x 0 1 1 -R nl + 1/R nr 


n = 1 , 2 ,.... 


(5.64) 


with R defined in (5.45). We call T G IIj_i an optimal polynomial preconditioner for the 
MR algorithm if it leads to the error bounds (5.64). 

Similarly, for the ME method with left polynomial preconditioning (5.56), the error 
bounds (5.47) and Corollary 5.10 suggest that the best possible choice of Y G II/_i is one 
for which the iterates’ x pp satisfy 



< 4 + 


n = 1,2,... , 


(5.65) 


for some a G Bri. A polynomial T G Hj_i is called an optimal preconditioner for the ME 
approach if it guarantees (5.65). 

With this notion of optimality, we can now state the main result of this section as 
follows. 


Theorem 5.13. Let l > 2. Then, 


V _ ^t( A ~ i(J ) + iT 

T/-i(A) = r 


(5.66) 


where 

Vi(u) = Ti( — , - — ^)-ReTi(-a) and r = - Im T/(-a), (5.67) 

v p — a 
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is an optima] polynomial preconditioner for the \IR and A/E methods. Here. Ti denotes 
the 1th Chebyshev polynomial (cf. (5.34)) and a is given in (5.62). 

Proof. First, note that, by (5.67), = -ir, and thus (5.66) defines indeed a 

polynomial T 6 II|_i . Next, consider the preconditioned matrix A = T ( .4)^4.. With 
(5.58) and since 7) maps the interval [—1,1] onto itself, it follows that the eigenvalues of 
the Hermitian part ^/(T) of .4 are contained in [<5, /?] where a := — 1 — ReT/(— a) and 
ft := 1 — Re7)(— a). Now we apply Theorem 5.11 (with a = a, 0 = 0, and a = r) and 
note that, by (5.33) and (5.34), 


0 + <5 + 2 ir 
a = = 

(3 — a 


— —Ti(—a) € Br‘ ■ 


The error bounds (5.64) and (5.65) are then an immediate consequence of parts a) and b) 
of Theorem 5.11, respectively. Hence Y/_i is an optimal polynomial preconditioner, and 
the proof is complete. Q 

We remark that, in [ELV], Eiermann, Li, and Varga developed a general theory for 
polynomial preconditioning for asymptotically optimal semi-iterative methods. In particu- 
lar, by means of Theorem 5.13 from [ELV], one can show that the polynomial preconditioner 
(5.66) is also best possible for semi-iterative procedures for the class of matrices (5.1). 

Also, recall that, for the GAL approach, there are in general no error bounds on which 
we could base the choice of a best possible polynomial T. However, in analogy to the case 
of real symmetric matrices (see [SW, SF, Szy]), preconditioning for the GAL method can 
be motivated by its close connection (cf. (5.24)) to the ME algorithm. Therefore, we 
regard (5.66) also as an optimal polynomial preconditioner for the GAL method. 

Finally, note that polynomial preconditioning is easily incorporated into the MR and 
ME/GAL Algorithms 5.3 and 5.4. Right preconditioning leads to slightly more economical 
implementations, and only this choice is considered in the sequel. The idea is to apply the 
CG type methods to the linear system Y;_i(A)Aj/ = b — Axo with starting guess y o = 0. 
The resulting iterates y„ of the MR and ME/GAL approaches are generated by Algorithm 
5.3 and 5.4, respectively, modified in the following way: substitute y n for x n , replace, in 
(5.22), a by r (defined in (5.67)), and finally, in step 2) of Algorithm 5.3 and 5.4, perform 
the following Lanczos recursion 


v = z (n) - d n u n - 0 n V n -\, 

2 


where z^ • — 


:=T,( 




0 - a 0 


Otn . 


a 


v H z {n) 

u n * i 


(5.68) 


and set a n = d n - ReT)(-a). We remark that for this computation only T), but never the 
complex polynomial (5.66), is used. The actual preconditioner T/_i appears only in the 
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translation of the y„ into the corresponding iterates 


X n — Zo + T,_,(A)y n 


(5.69) 


for the original system Ax = b. However, we do not need to generate x n in each step. 
Note that the norm |jr„|| of the residual r n = b — .-lx n is available (cf. Section 5.2) from 
the procedure generating y„, and the iteration is stopped as soon as ||r„|| is sufficiently 
reduced. Hence, x n is computed only once, namely in the very last step of the algorithm. 

Finally, notice that z (n) in (5.68) can be obtained by performing l steps of the classical 
Chebyshev semi-iterative method (see Golub and Varga [GV]). More precisely, setting 




:=r ; ( 




0 — a 0 — a 


T':=T- 


u> : = 


0-a 


(5.70) 


the three-term recurrence formula of the Chebyshev polynomials leads to the following 


Algorithm 5.14. (Computation of in (5.68).) 

0) Set z[ n) = v n and z[ n) = ui T'v n ; 

1) For j =2 ,1, compute 

_( n ) _ o, ,y' 7 ( n ) _ • 

Zj —ZJj. 1 z j-l z j-2> 

2) Set z^ = 2 { n) . 


We remark that the computation of z^ n ^ via Algorithm 5.14 requires 21 matrix- vector 
products T ■ v, v 6 R^, and 21 additional real multiplications. If T and r 0 are real (cf. 
Section 5.3), all rj n) are real too, and the work is halved. 

Similarly, using (5.66), (5.67), and again the three-term recurrence formula of the 
Chebyshev polynomials, a routine calculation shows that the following algorithm just yields 
the iterate (5.69). 

Algorithm 5.15. (Computation of x„ in (5.69).) 

0) Set h ( 0 n) = y„ and h\ n) = 2 u>(T'y n - (*±2 + ia)y n ); 

1 ) For j = 2, — 1 , compute 

h\ n) = TuTh™ - h { ;_\ + 2Tj(—a)y n ; 

2) Set x n — xo + j • 
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6. Complex versus equivalent real linear systems 


In this section, we study connections between (1.1) and its equivalent real versions. Unless 
stated otherwise, .4 is assumed to be a general complex N x N matrix. Recall that, in 
view of (1.10), the iterates of any Krylov subspace method (1.2) for solving (1.1) are of 
the form 

i n = i 0 +$(4)ro, $€ll n _i. (6.1) 


6.1. Equivalent real linear systems 

By taking real and imaginary parts in (1.1), we can rewrite (1.1) as the real linear system 

( 6 - 2 ) 


A* 

A second real version of (1.1) is 


Re x 


Re 6 

J 

Re A 

-ImA' 

Im x 


Im 6 


ImA 

Re A 


Re x 


Re b 

A — 

Re A 

ImA 

— Im x 


Im b 


Im A 

— Re A 


(6.3) 


Obviously, (6.2) and (6.3) are the only essentially different possibilities of rewriting (1.1) 
as a real 2 N x 2N system. Furthermore, note that .4* is nonsymmetric if, and only if. 
.4 ^ A h is non-Hermitian, whereas A.* is symmetric if, and only if, A = A T . Hence, for 
complex symmetric linear systems the approach (6.3) appears to be especially attractive 
since it permits the use of simple CG-type methods such as SYMMLQ and MINRES for 
real symmetric matrices. 

In the following proposition, we collect some simple spectral properties of .4, and A , „. 

Proposition 6.1. 

a) Let J = X~ l AX be the Jordan normal form of A. Then A * has the Jordan normal 
form 


J 0 
0 J 


= XI .4* AT* where X * := 


1 
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X -iX 
-iX X 


In particular, 


A(A*) = A(A) U A(A). 

b) The matrices A** and - A** are similar. In particular, 

-A, A, -A € A(A**) for all A 6 A(A**). 

Moreover, 

A (A**) = {A € C I A 2 € A(AA)}. 


(6.4) 

(6.5) 

( 6 . 6 ) 
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c) Let A = A t be complex symmetric. Then, there exists a singular value decomposition 
(the so-called Takagi SVD) of A of the form 


.4 = UTU T , U unitary , E = diag(<7i , cr 2 , . . . , cr.v) > 0. 
Moreover, .4*. is a real symmetric matrix with spectral decomposition 

A„ = I 7 ,7 117 VII ~ VI "here Y = Re i', Z = Im U 


Y -Z 


E 0 


'Y -Z' 

Z Y 




Z Y 


Proof, a) First, note that 
X„ = S 


X £ 
0 X 


where S := —r= 

V2 


In —Hn 
—Hn In 


is unitary. 


(6.7) 


( 6 . 8 ) 


(6.9) 


In particular, (6.9) shows that with X also X* is nonsingular. One readily verifies that 

5^.5 =lj j 


and, in view of (6.9), this implies (6.4). (6.5) is an obvious consequence of (6.4). 
b) Since 


0 

In 

-1 

A 

0 

In' 

-In 

0 


-In 

0 


the real matrices A±+ and -.4** are similar. Hence, (6.6) holds true. The relation between 
A(.4**) and A(X4) is known (see [HJ, p. 214] for a proof). 

c) (6.7) is the well-known Takagi singular value decomposition for symmetric matrices 
( e.g . [HJ, Corollary 4.4.4]). By rewriting (6.7) in terms of the real and imaginary parts of 
A and U , one obtains (6.8) (cf. [HJ, pp. 212-213]). ._ 

Roughly speaking, Krylov subspace methods are most effective for coefficient matrices 
.4 whose spectrum, except for possibly a few isolated eigenvalues, is contained in a half- 
plane which excludes the origin of the complex plane. On the other hand, if this half- 
plane condition is not satisfied and if a large number of eigenvalues of .4 straddle the 
origin, usually the convergence of CG-type algorithms is prohibitively slow. Typically, in 
these situations (see [Eis, Frel, Fre2] for examples), iterations based on Krylov subspaces 
generated by A offer no advantage over solving the normal equations (1.8) by standard 
CG. See Theorem 6.3 below for a theoretical result along these lines. 

For complex linear systems which arise in practice the half-plane condition is usually 
satisfied. Indeed, mostly 

A(A) C {A € C | ImA > 0). (6.10) 
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However, by rewriting (1.1) as real linear systems (6.2) respectively (6.3), one deliberately 
creates coefficient matrices whose spectra are most unfavorable for Krylov subspace meth- 
ods. The case (6.3) is especially bad since, in view of (6.6). A(.4**) is symmetric with 
respect to real and imaginary axis and hence the eigenvalues always embrace the origin. 
Similarly, by (6.5), the coefficient matrix .4* of (6.2) in general has eigenvalues in the upper 
as well as in the lower half-plane. In particular, if (6.10) holds and, as in most applications, 
the Hermitian part (.4 + A H )/ 2 of .4 is indefinite, the spectrum of .4, straddles the origin 
and the half-plane condition is not satisfied for .4*. The following example illustrates this 
behavior. 

Example 6.1. Consider the subclass of 5.1 of complex symmetric matrices of the form 
.4 = T + ial where T = T T is real and a > 0. (6. 11) 

Obviously, 

A(A) = {A = p + i<r | /x e v(T)} 

C S := [fi m + + iv] (6-12) 

where p m = A min (T) and pm = A max (T). Note that the complex line segment S is parallel 
to the real axis and always contained in the upper half of the complex plane. In view of 
(6.5), (6.12) implies 

A(^4») = {A = p ± ia | p 6 <t(T)} C S U 5. 

We remark that S U 5 is a tandem slit consisting of the two complex intervals S and S 
which are parallel and symmetric to each other with respect to the real axis. Moreover, the 
eigenvalues of .4. straddle the origin, if the Hermitian part T of .4 is indefinite. Finally, 
using (6.11) and part b) of Proposition 6.1, we obtain 

A (A„) = {X = ±vV + <7 2 | A € A(T)} 

c - yftii + <y2 > u [ a - ■ 

Note that the class (6.11) is closely related to shifted skewsymmetric matrices. Indeed, if, 
instead of Ax = 6, we rewrite — iAx = —ib as a reeil system (6.2), one obtains 

(-MJ.-p'f 5, 5:=[“ - T ] (=-5 7 '). (6.13) 

Then, the eigenvalues are contained in a line segment which is parallel to the imaginary 
axis and symmetric with respect to the real axis: 

A((-i,4)*) = {A = <7± ifi | n e A (T)} C [<r - ip, a + ip], p = rnax{|// m |, \pm\}- 
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6.2. Correspondence of Krylov subspace methods 

In analogy to (6.1) for complex linear systems (1.1). a Krylov subspace method for the 
solution of the equivalent real systems (6.2) respectively (6.3) generates iterates 


Rex n 

Imx n 


Re xo 
Im xo 


+ $04*) 


Rer 0 
Imr 0 J ’ 


♦ enfi,, 


respectively 


Rex n 
— Im x n 


Rexo 
— Im xo 


+ $(A..) 


Rer 0 
Imr 0 J ’ 


■*en < „ r 2 1 . 


In the sequel, the notation 

htHc, B ) := {$(B)c | $ 6 1+2, } (c K„(c, B)) 


i6.14) 


(6.15) 


will be used. 

At first glance, it might appear that Krylov subspace iterations (6.1) respectively 
(6.14-6.15) for the original complex systems respectively its equivalent real versions cor- 
respond to each other. However, as the following proposition shows this is not the case in 
general. 

Proposition 6.2. Let n € N. 

a) Let $ G n n _i. Then, i n = x 0 + $(A)r 0 is equivalent to 


Rex„ 
Im x n 


Rex 0 

Imx 0 


+ $i (A.) 


Rer 0 
Im r 0 


+ $ 2 (-4*) 


Im r 0 
- Re r 0 


(6.16) 


where $ = $i + i$ 2 , $ii $2 6 nl-i- 

b) Let $ e Then, (6.15) is equivalent to 

x n = Rei n + i Im x„ = x 0 + $(A.4)ro + T(AA)Ar 0 (6-l~) 


where $ € n ( ^_ 1)/2j and T e n ( L £_ 2)/2J axe defined by $(A) = $(A 2 ) + AT(A 2 ). 
Proof. First, we note that, for j = 0, 1, . . 


(A.y 


Re A> - Im 
Im A* Re 


and 



Re (AAy 
— Im( A Ay 


Im (A Ay 
Re (AAy 


(6.18) 


as is easily verified by induction on j. 
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a) Let 7) and 8 } be the coefficients of the real polynomials $1 and $2* respectively. 
Then. 

n — 1 

Re $(.4) = (V Re AJ ~ 6 J Im AJ )’ 

J=0 (6.19) 

n — 1 

Im $(.4) = ^ (7 ; Im A- 7 + 8 } Re A J ) . 

;=0 

By reformulating x n = tq -f *$(A)ro, by means of (6.19) and the first relation in (6. IS), in 
terms of real and imaginary parts, one immediately obtains (6.16). 

b) A routine calculation, using the second identity in (6.18), shows that (6.15) can be 
rewritten as 


Rex n 


Re xo 

1 

Re{^(AA)r 0 + T(AA)Aro} 

Im x fi 


— Im xo 

“h 

- Im{^(AA)r 0 + T(AA)Ar 0 } 


Hence (6.15) and (6.17) are equivalent. Q 

In view of part a) of Proposition 6.2, the corresponding real equivalent of complex 
Krylov schemes (6.1) are iterations of the type (6.16) and not the obvious real Krylov 
subspace methods (6.14). Clearly, the actual choice of the polynomials in (6.1) respectively 
(6.14-6.15) is aimed at obtaining iterates which are — in a certain sense — best possible 
approximations to the exact solution of the corresponding linear system. By using schemes 
of the type (6.14), from the first, one gives up n of the 2n real parameters which are 
available for optimizing complex Krylov subspace methods (6.1). Consequently, it is always 
preferable to solve the complex system (1.1) rather than the real version (6.2) by Krylov 
subspace methods. Furthermore, numerical tests reveal that the convergence behavior of 
the two approaches can be drastically different (see Chapter 7). 

6.3. A connection between MR and CGNR for complex symmetric matrices 

Now assume that A is a complex symmetric N x N matrix. Then, in view of part c) of 
Proposition 6.1, A** is a real symmetric indefinite matrix whose spectrum is given by 

A(A**) = {±<7y | j = 1,. . . ,N}. (6.20) 

Here crj = Oj{A ) > 0, j = 1, . . . , N, denote the singular values of A. 

Since there are simple extensions, namely SYMMLQ and MINRES, (cf. Section 5.1) 
of classical CG to reed symmetric indefinite matrices, it is especially tempting to solve (6.3) 
by one of these methods. Recall that SYMMLQ generates iterates defined by a Galerkin 
condition, whereas MINRES is based on a minimal residual MR property (cf. (1.3)). Here, 
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we consider only the MR approach. Applied to (6.3) it generates a sequence of iterates 
n = 1,2,..., which are characterized by 


||&** - A** 


mm 

£ G IQ + R n ^ ( r 0* ’ 


||5.+ — A**z||, z n G + hi \ r o (6.21) 


.4,*) 


Here, we have set 


Re b 


Rex n 

Im b 

, z n — 

Im x „ 


for n = 0, 1, . . . , 


6 ** — . 4 *** 0 - 


( 6 . 22 ) 


Roughly speaking, CG-type algorithms for real symmetric indefinite systems converge 
slowly if the coefficient matrix is strongly indefinite, in the sense that it has many positive 
as well as many negative eigenvalues. Unfortunately, since, by (6.20), A(A,*) is even 
symmetric to the origin, A** exhibits this undesirable property. Indeed, numerical tests 
show that the convergence behavior of the MR method (6.21) is practically identical to 
that of the tabooed approach to (1.1) via solving the normal equations (1.8) by standard 
CG [HSj. In the sequel, we refer to this latter method as CGNR. Notice that the iterates 
x n of CGNR are defined by the minimization property 


\\b - Ax/|| 


min lit — Ax||, x/ G xo + Ki(A R ro, A H A). 

x£zo+ r 0 } A H A) 


(6.23) 


Next, we prove that MR and CGNR are even equivalent, if the starting residual Tq* 
satisfies a certain symmetry condition. Note that, corresponding to the spectral decompo- 
sition (6.8), rj* can be expanded into eigenvectors of .4** as follows: 




' C 1 

’Y -Z' 

c with c = 

; 

Z Y 


* 



- ^2n - 


G R 2n . 


(6.24) 


Theorem 6.3. Let x* fR and xf° NR denote the iterates generated by ( 6.21-6.22 ) and 
(6.23), respectively, both started with the same initial guess x 0 G C v . Assume that c m 
the expansion (6.24) of rj* satisfies 

|c>| = |c^+j|, ; = 1,2, ...,N. (6.25) 


Then, 


.CGNR 

c l 


= X 


MR _ MR / _ n i 

2 1 — x 2/+l > * — u, i, . . . 


(6.26) 


Proof. First, note that, in view of (6.8) and (6.24), Cj and c n+J are components corre- 
sponding to a pair of symmetric eigenvalues ±aj of A**. However, for any real symmetric 
linear system A**z = 6** with “symmetric” eigenvalues and “symmetric” starting residual 
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r" in the sense of (6.20) and (6.25). respectively, the MR method generates iterates with 
: n 6 z 0 + A^ /2J (>l„r 0 **,A 2 „) (see. e.g., [Fre2]). Consequently, the iterates defined by 
(6.21) satisfy 

Z2 / == Z2/+1 € z o + I\j ^(-4**r 0 *, *4 2 .). (6-27) 

In particular, by (6.22), (6.27) shows that x%j R = ijV+i • 

It remains to prove the first relation in (6.26). To this end, we remark that 


||6** - A*,z || = ||6 - .4x|| for all 


Rex 
— Im i 


e c 


N 


(6.28) 


Moreover, by using (6.22) and part b) of Proposition 6.2 (applied to polynomials <£(A) = 
AT(A 2 )), we deduce 




+ A'} r) (A 


*+' o 


',A 


2 

+ sta 


J-{ 


Rex 
- Im x 


x € x 0 + K f r) (A h t 0 ,A h A ) } (6.29) 


(notice that A = A H in (6.17)!). In view of (6.27-6.29), (6.21) (for n = 2/) can be rewritten 
in the form 

||6 — .4x£f R || = min ||6 — Ax\\, x%j R G xo + Kj r) (A H r 0 , A H A). (6.30) 

zexo+K\ T) {A H r a ,A H A) 

Finally, remark that the iterates of CGNR always correspond to real polynomials, i.e., 
x OGSR ^ Xo -). k\ t \A h tq, A H A). Hence, by comparing (6.23) with (6.30), we conclude 
that xf 0 ™ = x% R . □ 

Clearly, the special symmetry condition (6.25) will not be satisfied in general. Nev- 
ertheless, all our numerical experiments showed (see Examples 7.3 and 7.4) that (6.26) is 
still fulfilled approximately, i.e., 


CGNR 

X, 


A/H 

X 2 / 


MR 

x 2i+l’ 


/ = 0 , 1 ,... 


(6.31) 
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7. Numerical experiments 


We have performed extensive numerical tests with the QMR algorithm and all the other 
iterative schemes considered in this thesis. In this chapter, we present a few tvpical results 
of these experiments for complex symmetric and shifted Hermitian linear systems arising 
from the Helmholtz equation (1.5). Numerical experiments with the QMR method applied 
to real nonsymmetric matrices are reported in [FNl, FN2]. 

7.1. The test problems 

Consider (1.5) on the unit square G = (0,1) x (0,1) with <J\ € R a constant and oi 
a real coefficient function. First, assume that u satisfies Dirichlet boundary conditions. 
Then, approximating (1.5) by finite differences on a uniform m x m grid with mesh size 
h = l/(m + 1) yields a linear system (1.1) with A an A r x ;V, N - m 2 , matrix of the form 

A = T 4* ih 2 D , T = Ao — <7\h 2 1, D — diag(<ii , d. 2 , ■ • • , d n ). (7.1 ) 

Here Aq is the symmetric positive definite matrix arising from the usual five-point dis- 
cretization of —A and the diagonal elements of D are just the values of 02 at the grid 
points. 

Similarly, if we consider the real Helmholtz equation (1.5), i.e., 02 = 0, but now with 
a typical complex boundary condition such as 

&XL 

— =iau on {(1, y) -1 < y < 1} 
on 

(which is discretized using forward differences) and Dirichlet boundary conditions on the 
other three sides of the boundary of G, one again arrives at (7.1) where 

j f o / h if J — Itti , l 1 , . . . , 771 . (7.2) 

J 1 0 otherwise. 

The test problems presented in this chapter are all linear systems Ax = b with complex 
symmetric coefficient matrices of the type (7.1). Note that (/.l) is also a shifted Hermitian 
matrix if D is a multiple of the identity matrix. 

For Examples 7.1 and 7.5, the mesh size h = 1/64 was chosen, resulting in a 3969x3969 
matrix A. In Examples 7.2-4, h = 1/32 and thus A is a 961 x 961 matrix. Example 7.6 
was run on a 128 x 128 grid leading to a 16384 x 16384 matrix .4. The right-hand side b 
was chosen to be a vector with random components in [-1, 1] + i[-l, 1], with the exception 

of Example 7.2, where b had constant components 1 + i , and of Example 7.5, where the 

exact solution z* was generated with random components in [—1,1] + *(— 1> 1] an ^ 
right-hand side was set to b := Ax *. As starting vector always x 0 = 0 was chosen. 
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As stopping criterion, we used 


Rn ■ = 


l|fe-.4x w ll 
P“ -4x 0 1| 


< 10 ~ 6 . 


(7.3) 


In Figures 7.1-4, the relative residual norm (7.3), R n , is plotted versus the number N n of 
matrix- vector products with *4, .4*, or .4»». Note that N n = n is identical to the iteration 
number, except for CGS respectively CGNR which both require tw f o matrix- vector products 
.4 • v respectively A • v, A-v per iteration and for which N n = 2 n. For GMRES [SS2], work 
and storage per iteration step n grows linearly with n and in practice it is necessary to use 
restarts. In the sequel, GMRES(n 0 ) and GMRES*(n 0 ) refer to complex and real versions 
— restarted after every n 0 iterations — of the GMRES method applied to (1.1) and (6.2), 
respectively. 


7.2. Complex symmetric linear systems 

In a first series of experiments, QMR (with different weighting strategies) and BCG were 
compared. The natural choice (3.7) turned out to be the best strategy in all cases. In the 
following, QMR always refers to Algorithm 3.1 with weights (3.7). Then QMR produces 
residual vectors whose norms are almost monotonically decreasing and generally smaller 
than those of the BCG residuals. However, convergence of QMR and BCG typically 
occurred after a comparable number of iterations. The following example is typical. 

Example 7.1. Here, (7.1) is a 3969 x 3969 matrix with o-j = 200, and the diagonal 
elements of D are given by (7.2) with a = 10. In Figure 7.1, the convergence behavior 
of BCG, QMR. and an unweighted version of the QMR approach (based on the Lanczos 
vectors v n , as generated by the complex symmetric Lanczos Algorithm 4.1) is displayed. 
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Figure 7.1. Convergence behavior of BCG, QMR, and an unweighted version of 
the QMR approach for Example 7.1. 

Next, we compared the CGS Algorithm 4.8 and complex GMRES with QMR and 
BCG. Typically, CGS needed slightly fewer iterations than QMR and BCG to reach (7.3). 
However, per iteration, QMR and BCG require only about half as much work and storage 
and thus CGS is more expensive than QMR or BCG for complex symmetric matrices. Due 
to the necessary restarts, GMRES was never competitive with QMR, BCG, or CGS. 

Example 7.2. In (7.1), we set A = 961, o\ = 100 and dj , j — l,...,n, are chosen as 
random numbers in [0,10]. Figure 7.2 shows the convergence behavior of GMRES(20), 
QMR, BCG, and two runs of CGS with different starting vectors so , namely so = r o 
respectively So with random components in [—1,1] + i [ — 1,1]. Notice the extremely large 
residual norms in the early stage of the CGS iteration. 
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Figure 7.2. Convergence behavior of GMRES(20), QMR, BCG, and two runs 
of GGS with different starting vectors so for Example 7.2. 

In the following two examples, we compared CG-type methods for Ax = b with real 
schemes for the equivalent real systems (6.2) respectively (6.3). 

MR(.4„*) denotes the MR method (6.21) applied to the real symmetric system (6.3). 

Example 7.3. Here, in (7.1), N = 961, cr, = 100, and d } are given by (7.2) with a = 100. 
In Figure 7.3, the convergence behavior of QMR, MR(.4**), GMRES(20), GMRES(5), 
GMRES*(5), and CGNR is shown. Notice that, although the symmetry condition (6.25) 
is not fulfilled, the curves for CGNR and MR(.4**) are almost identical. This confirms 
(6.31). Finally, we tried GMRES(fc 0 ) and GMRES„(fc 0 ) also with other restart parameters 
ko . For this example, both methods did never converge. 
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Figure 7.3. Convergence behavior of QMR, MR(i,*), GMRES(20), GMRES(5), 
GMRES+(5), and CGNR for Example 7.3. 


7.3. Shifted Hermitian linear systems 

Now we choose D — < 72 / in (<.l). Then, (/-l) is a shifted Hermitian matrix of the form 

A = T + iaI, T= Aq -<rih 2 I, a:=a 2 h 2 . (7.4) 

Note that .4 is a shifted Hermitian matrix of the form (6.11) (cf. Example 6.1). In partic- 
ular, .4 belongs to the class of matrices (5.1) and we can apply the algorithms developed 
in Chapter 5 to Ax = b. 

Example 7.4. Let A be the 961 x 961 matrix (7.4) with a x = 1000 and = 100. 
Here, we denote by A/i?(i4) the run with AIR Algorithm o.3 applied to the original sjstem 
Ax — b. Recall that, by rewriting — iAx — —ib as a real system (6.2), one obtains a 
shifted skewsymmetric matrix (6.13), (-L4)*. Again, for such matrices an efficient true 
minimal residual algorithm, denoted by MR((-iA).), exists [EES, Frel]. Figure 7.4 shows 
the convergence behavior of MR(4), MR(.4**), MR(( — i.4)*),CGNR, and GMRES(20). 
Notice that MR((-iA)*) and CGNR are nearly identical. This is typical for the case that 
<r is small compared to the spectral radius of T. Furthermore, if ct = 0, i.e. ( — 1.4)* in 
(6.13) is skewsymmetric, CGNR and MR((— iA)*) are even equivalent [Frel]. 
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Figure 7.4. Convergence behavior of MR(A), MR(A**), MR((— ?A)*),CGNR, 
and GMRES(20) for Example 7.4. 

In the next example, we tested the various polynomial preconditioners discussed in 
Section 5.5. Note that the eigenvalues of Ao are known, and for our experiments with 
polynomial preconditioning we have used the true values 

d = ^min(-^o) — <7 1 h? , p = A max (.4o ) — <7\ h ( ^ - 5) 

of the extreme eigenvalues of T (cf. (5.58)). 

Examples 7.5. The matrix A is 3969 x 3969. For the constants in (7.4), values of the form 
( 7 j = <7i(0), (72 = <72 (^) wore chosen. Here 0 < ip < x/2 is a parameter such that the points 
a(ip) = (P + a + 2 icr)/(P — a) all lie on the same ellipse R > 1 fixed, with v describing 
the position of a(ift ) on Br (see (5.31) and (5.33)). The case ip = 0 corresponds to a 
symmetric positive definite matrix (7.4), and for our experiments, we have chosen R > 1 
such that A = A$ for rp = 0. Moreover, notice that with increasing rp, the symmetric part 
T of (7.4) becomes more and more indefinite and a = —ft for ip — v / 2. Also, the shift <7 
increases with ip. Finally, we remark that the error bounds of Theorem 5.11 suggest that 
the MR and ME methods should display similar convergence rates for all ip. In Tables 
7.1-4, for several values of ip (stated in degree!) and the various CG-type methods, we 
list the number of iterations which were necessary to reach (7.3). A indicates that the 
process still had not converged after 200 steps. In Table 7.1 the results for the MR, ME, 
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and GAL Algorithms 5.3 respectively 5.4 (without preconditioning) are given. The Tables 
7.2, 7.3, and 7.4 display the behavior of the three methods combined with the polynomial 
preconditioner (5.66) with l = 6,11. and 16, respectively. Also listed are the results for 
the ZPCG method consisting of the classical CG algorithm with Zolotarev polynomial 
preconditioner (5.62) (see Theorem 5.12). 


0 / Degree 

0 

5 

10 

15 

20 

25 

30 

35 

40 

45 

MR 

120 

126 

148 

165 

175 

183 

190 

197 

203 

208 

ME 

183 

177 

166 

186 

191 

210 

210 

215 

224 

231 

GAL 

129 

144 

165 

182 

198 

208 

213 

222 

225 

231 


x pj Degree 

50 55 60 65 70 75 80 85 90 

MR 

ME 

GAL 

212 217 221 224 228 232 234 237 239 

236 237 244 245 250 252 259 260 263 

236 240 244 248 253 255 259 261 264 


Table 7.1. Number of iterations after which the various algorithms had reduced 
the norm of the starting residual by 10 -6 . Listed are the numbers for the basic 
methods without preconditioning. The family (depending on the parameter ip) 
of test problems is the one described in Example 7.5. 


tp/ Degree 

0 

5 

10 

15 

20 

25 

30 

35 

40 

45 

PPMR 

47 

47 

47 

47 

47 

47 

48 

47 

47 

47 

PPME 

63 

47 

47 

47 

47 

47 

64 

47 

47 

47 

PPGAL 

49 

49 

49 

49 

50 

50 

50 

50 

50 

49 

ZPCG 

+ 

★ 

148 

99 

74 

59 

49 

56 

62 

63 


xp j Degree 

50 

55 

60 

65 

70 

75 

80 

85 

90 

PPMR 

47 

47 

47 

47 

47 

47 

47 

47 

47 

PPME 

47 

47 

63 

47 

47 

47 

47 

47 

63 

PPGAL 

49 

49 

49 

49 

49 

49 

50 

50 

50 

ZPCG 

59 

53 

48 

53 

57 

58 

56 

52 

49 


Table 7.2. Same as Table 7.1, but with polynomial preconditioning of degree 
1 = 6 . 
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1 1'/ Degree 

0 

5 

10 

15 

20 

25 

30 

35 

40 

45 

PPMR 

26 

26 

26 

26 

26 

26 

26 

26 

26 

26 

PPME 

33 

26 

27 

29 

26 

26 

28 

27 

26 

27 

PPGAL 

28 

28 

28 

28 

28 

28 

28 

28 

28 

28 

ZPCG 

* 

87 

44 

29 

32 

34 

29 

29 

31 

29 


0/ Degree 

50 

55 

60 

65 

70 

75 

so 

85 

90 

PPMR 

26 

26 

26 

26 

26 

26 

26 

26 

26 

PPME 

30 

27 

26 

30 

27 

26 

26 

28 

27 

PPGAL 

28 

28 

28 

28 

28 

28 

28 

28 

28 

ZPCG 

27 

30 

29 

27 

29 

29 

27 

2S 

29 


Table 7.3. Same as Table 7.1, but with polynomial preconditioning of degree 
/ = 11 . 


0/ Degree 

0 

5 

10 

15 

20 

25 

30 

35 

40 

45 

PPMR 

IS 

18 

18 

18 

18 

18 

18 

18 

18 

18 

PPME 

23 

19 

18 

18 

17 

17 

18 

18 

17 

23 

PPGAL 

20 

20 

19 

19 

20 

20 

19 

19 

19 

20 

ZPCG 

146 

41 

21 

23 

21 

20 

21 

19 

20 

19 


0 / Degree 

50 

55 

60 

65 

70 

75 

80 

85 

90 

PPMR 

18 

18 

18 

18 

18 

18 

18 

18 

18 

PPME 

17 

18 

18 

17 

17 

17 

17 

17 

23 

PPGAL 

19 

19 

19 

19 

19 

19 

19 

19 

20 

ZPCG 

20 

19 

20 

19 

19 

20 

19 

20 

19 


Table 7.4. Same as Table 7.1, but with polynomial preconditioning of degree 
l = 16. 

From these results, we draw the following conclusions. If used without preconditioning, 
the MR method appears to be superior to the ME and GAL approaches. However, note 
that the stopping criterion (7.3) is based on the norm of the residual, and this is more 
favorable for the MR method. A comparison based on the Euclidean norm of the error 
vector x* — x„ displays a similar convergence behavior for the ME and MR approaches. 
In combination with polynomial preconditioning, the performance of all three methods 
PPMR, PPME, and PPGAL is nearly identical. Also, note that the polynomial (5.66) 
yields a very efficient preconditioner which reduces the number of iterations significantly 
in all examples. Finally, as already suspected in the previous section, the strategy leading 
to the ZPCG method is a very dangerous one, and the algorithm even fails to converge if 
A is close to a positive definite matrix. 
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Examples 7.6. Here .4 is a 163S4 x 163S4 matrix of the form (7.4) with cr i — a o = 100. 
We applied the PPMR method based on the MR Algorithm 5.3 combined with polynomial 
preconditioning (5.66) of various degrees /. This example was run on a massively parallel 
computer, the CM-2, with 16,3S4 processors. In Figure 7.5, we plot the number of iter- 
ations after which the PPMR method had reached (7.3) versus l. In Figure 7.6. we olot 
the actual computing time (in seconds) versus l. Clearly, polynomial preconditioning :s an 
efficient technique on the CM-2. 



Figure 7.5. Number of iterations for PPMR versus the degree l of the precon- 
ditioner for Example 7.6. 
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Figure 7.6. Actual computing time (in seconds) for PPMR on the CM-2 versus 
the degree l of the preconditioner for Example 7.6. 


We conclude this section with two further remarks, all the results for the PPMR, 
PPME, and PPGAL methods were obtained with right polynomial preconditioning (RPP) 
(cf. (5.57)). Experiments with left polynomial preconditioning (LPP) (see (5.56)) gave 
nearly identical results. However, since implementations of RPP are slightly more eco- 
nomical, we therefore recommend RPP over LPP. Finally, recall that for our tests, the 
true extreme eigenvalues (7.5) of T were used. Of course, in general, such information is 
not available. However, it is possible to obtain good estimates of these quantities after 
relatively few steps of the Hermitian Lanczos Algorithm 5.1. 
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8. Concluding remarks 


Complex non-Hermitian linear systems arise in important applications, such as tne numer- 
ical solution of the complex Helmholtz equation. Often their coefficient matrices exhibit 
special structures, such as complex symmetry, or they are shifted Hermitian matrices. Here, 
we have considered Krylov subspace methods for the solution of complex non-Hermitian 
linear systems. 

First, we have presented a novel Krylov subspace iteration, the QMR method, for 
general nonsingular non-Hermitian linear systems. The method uses a recently proposed 
[FGN, FN1] robust implementation of the look-ahead Lanczos algorithm to generate basis 
vectors for the Krylov subspaces K n (r Q ,A). The QMR iterates are characterized by a 
quasi-minimal residual property over i^ n (ro, A). Both the look-ahead Lanczos algorithm 
and the computation of the actual QMR iterates can be implemented using only short 
recurrences. The QNIR approach is closely related to the BOG algorithm, however, unlike 
BCG, the QMR algorithm has smooth convergence curves and good numerical properties. 
Furthermore, we have derived bounds for the QMR residuals which are essentially the 
same as the standard bounds for GMRES. To the best of our knowledge, this is the first 
convergence result for a BCG-like algorithm for general non-Hermitian matrices. 

Second, we discussed various CG-type methods designed for two special classes of 
complex non-Hermitian matrices. In particular, we have shown that work and storage 
for the QMR and BCG methods is roughly halved for complex symmetric linear systems. 
For shifted Hermitian matrices, we have investigated three different CG-type approaches 
with iterates defined by a minimal residual property, a Galerkin type condition, and an 
Euclidean error minimization. Numerically stable implementations were proposed and 
error bounds were derived for all three methods. Moreover, it was shown how the special 
shift structure can be preserved by using polynomial preconditioning, and results on the 
optimal choice of the polynomial preconditioner were given. 

It is very tempting (and often done in practice!) to avoid complex linear system bj 
solving equivalent real systems instead. We have presented some theoretical and numerical 
results which show that this — at least for Krylov subspace methods — is a fatal approach. 
Typically, the resulting real systems are unequally harder to solve by conjugate gradient 
type algorithms than the original complex ones. 

An important question, that we have not addressed here, is how to construct efficient 
preconditioners for complex symmetric linear systems, such as the ones arising from the 
complex Helmholtz equation. This will be the subject of a forthcoming report. 
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